API#

Python API#

Processing#

Calculate ambient profile

setup_anndata

Calculate ambient profile for relevant features

Training#

The core module of scar

model

The scar model

Synthetic_dataset#

Generate synthetic datasets (scRNAseq, CITE-seq, scCRISPRseq) with ambient contamination

`scrnaseq`	Generate synthetic single-cell RNAseq data with ambient contamination
`citeseq`	Generate synthetic ADT count data for CITE-seq with ambient contamination
`cropseq`	Generate synthetic sgRNA count data for scCRISPRseq with ambient contamination

Plotting#

Plotting functions (under development).

Reporting#

Generate denoising reports (under development).

Command Line Interface#

scAR (single-cell Ambient Remover) is a deep learning model for removal of the ambient signals in droplet-based single cell omics

usage: scar [-h] [--version] [-ap AMBIENT_PROFILE] [-ft FEATURE_TYPE]
            [-o OUTPUT] [-m COUNT_MODEL] [-sp SPARSITY] [-bk BATCHKEY]
            [-cache CACHECAPACITY] [-gnf GET_NATIVE_FREQUENCIES]
            [-hl1 HIDDEN_LAYER1] [-hl2 HIDDEN_LAYER2] [-ls LATENT_DIM]
            [-epo EPOCHS] [-d DEVICE] [-s SAVE_MODEL] [-batchsize BATCHSIZE]
            [-batchsize_infer BATCHSIZE_INFER] [-adjust ADJUST]
            [-cutoff CUTOFF] [-round2int ROUND2INT] [-clip_to_obs CLIP_TO_OBS]
            [-moi MOI] [-verbose VERBOSE]
            count_matrix [count_matrix ...]

Positional Arguments#

count_matrix: The file of raw count matrix, 2D array (cells x genes) or the path of a filtered_feature_bc_matrix.h5

Named Arguments#

--version

show program’s version number and exit

-ap, --ambient_profile

The file of empty profile obtained from empty droplets, 1D array

-ft, --feature_type

The feature types, e.g. mRNA, sgRNA, ADT, tag, CMO and ATAC

Default: 'mRNA'

-o, --output

Output directory

-m, --count_model

Count model

Default: 'binomial'

-sp, --sparsity

The sparsity of expected native signals

Default: 0.9

-bk, --batchkey

The batch key for batch correction

-cache, --cachecapacity

The capacity of cache for batch correction

Default: 20000

-gnf, --get_native_frequencies

Whether to get native frequencies, 0 or 1, by default 0, not to get native frequencies

Default: 0

-hl1, --hidden_layer1

Number of neurons in the first layer

Default: 150

-hl2, --hidden_layer2

Number of neurons in the second layer

Default: 100

-ls, --latent_dim

Dimension of latent space

Default: 15

-epo, --epochs

Training epochs

Default: 800

-d, --device

Device used for training, either ‘auto’, ‘cpu’, or ‘cuda’

Default: 'auto'

-s, --save_model

Save the trained model

Default: False

-batchsize, --batchsize

Batch size for training, set a small value upon out of memory error

Default: 64

-batchsize_infer, --batchsize_infer

Batch size for inference, set a small value upon out of memory error

Default: 4096

-adjust, --adjust

Only used for calculating Bayesfactors to improve performance,

‘micro’ – adjust the estimated native counts per cell. Default.

‘global’ – adjust the estimated native counts globally.

False – no adjustment, use the model-returned native counts.

Default: 'micro'

-cutoff, --cutoff

Cutoff for Bayesfactors. See [Ly2020]

Default: 3

-round2int, --round2int

Round the counts

Default: 'stochastic_rounding'

-clip_to_obs, --clip_to_obs

clip the predicted native counts by observed counts, use it with caution, as it may lead to overestimation of overall noise.

Default: False

-moi, --moi

Multiplicity of Infection. If assigned, it will allow optimized thresholding, which tests a series of cutoffs to find the best one based on distributions of infections under given moi. See [Dixit2016] for details. Under development.

-verbose, --verbose

Whether to print the logging messages

Default: True