API#

Python API#

Processing#

Calculate ambient profile

setup_anndata

Calculate ambient profile for relevant features

Training#

The core module of scar

model

The scar model

Synthetic_dataset#

Generate synthetic datasets (scRNAseq, CITE-seq, scCRISPRseq) with ambient contamination

`scrnaseq`	Generate synthetic single-cell RNAseq data with ambient contamination
`citeseq`	Generate synthetic ADT count data for CITE-seq with ambient contamination
`cropseq`	Generate synthetic sgRNA count data for scCRISPRseq with ambient contamination

Plotting#

Plotting functions (under development).

Reporting#

Generate denoising reports (under development).

Command Line Interface#

scAR (single cell Ambient Remover): denoising drop-based single-cell omics data

usage: scar [-h] [--version] [-ap AMBIENT_PROFILE] [-ft FEATURE_TYPE]
            [-o OUTPUT] [-m COUNT_MODEL] [-sp SPARSITY] [-hl1 HIDDEN_LAYER1]
            [-hl2 HIDDEN_LAYER2] [-ls LATENT_DIM] [-epo EPOCHS] [-d DEVICE]
            [-s SAVE_MODEL] [-batchsize BATCHSIZE]
            [-batchsize_infer BATCHSIZE_INFER] [-adjust ADJUST]
            [-cutoff CUTOFF] [-round2int ROUND2INT] [-clip_to_obs CLIP_TO_OBS]
            [-moi MOI] [-verbose VERBOSE]
            count_matrix [count_matrix ...]

Positional Arguments#

count_matrix: The file of raw count matrix, 2D array (cells x genes) or the path of a filtered_feature_bc_matrix.h5

Named Arguments#

--version

show program’s version number and exit

-ap, --ambient_profile

The file of empty profile obtained from empty droplets, 1D array

-ft, --feature_type

The feature types, e.g. mRNA, sgRNA, ADT, tag, CMO and ATAC

Default: “mRNA”

-o, --output

Output directory

-m, --count_model

Count model

Default: “binomial”

-sp, --sparsity

The sparsity of expected native signals

Default: 0.9

-hl1, --hidden_layer1

Number of neurons in the first layer

Default: 150

-hl2, --hidden_layer2

Number of neurons in the second layer

Default: 100

-ls, --latent_dim

Dimension of latent space

Default: 15

-epo, --epochs

Training epochs

Default: 800

-d, --device

Device used for training, either ‘auto’, ‘cpu’, or ‘cuda’

Default: “auto”

-s, --save_model

Save the trained model

Default: False

-batchsize, --batchsize

Batch size for training, set a small value upon out of memory error

Default: 64

-batchsize_infer, --batchsize_infer

Batch size for inference, set a small value upon out of memory error

Default: 4096

-adjust, --adjust

Only used for calculating Bayesfactors to improve performance,

‘micro’ – adjust the estimated native counts per cell. Default.

‘global’ – adjust the estimated native counts globally.

False – no adjustment, use the model-returned native counts.

Default: “micro”

-cutoff, --cutoff

Cutoff for Bayesfactors. See [Ly2020]

Default: 3

-round2int, --round2int

Round the counts

Default: “stochastic_rounding”

-clip_to_obs, --clip_to_obs

clip the predicted native counts by observed counts, use it with caution, as it may lead to overestimation of overall noise.

Default: False

-moi, --moi

Multiplicity of Infection. If assigned, it will allow optimized thresholding, which tests a series of cutoffs to find the best one based on distributions of infections under given moi. See [Dixit2016] for details. Under development.

-verbose, --verbose

Whether to print the logging messages

Default: True