Tutorials#

There are two ways to run scar. For Python users, we recommend the Python API; for R users, we recommend the command line tool.

Run scar with Python API#

Run scar with the command line tool#

The command line tool supports two formats of input.

Use `.h5` files as the input#

We can use the output of cellranger count filtered_feature_bc_matrix.h5 as the input for scar:

scar filtered_feature_bc_matrix.h5 -ft feature_type -o output

filtered_feature_bc_matrix.h5, a filtered .h5 file produced by cellranger count.

feature_type, a string, either ‘mRNA’ or ‘sgRNA’ or ‘ADT’ or ‘tag’ or ‘CMO’ or ‘ATAC’.

Note

The ambient profile is calculated by averaging the cell pool under this mode. If you want to use a more accurate ambient profile, please consider calculating it and using .pickle files as the input, as detailed below.

The output folder contains an h5ad file:

output
     └── filtered_feature_bc_matrix_denoised_feature_type.h5ad

The h5ad file can be read by scanpy.read as an anndata object:

anndata.X, denosed counts.
anndata.obs[’noise_ratio’], estimated noise ratio per cell.
anndata.layers[’native_frequencies’], estimated native frequencies.
anndata.layers[’BayesFactor’], bayesian factor of ambient contamination.
anndata.obs[’sgRNAs’ or ‘tags’], optional, feature assignment, e.g., sgRNA, tag, CMO, and etc..

Use `.pickle` files as the input#

We can also run scar by:

scar raw_count_matrix.pickle -ft feature_type -o output

raw_count_matrix.pickle, a file of raw count matrix (MxN) with cells in rows and features in columns.

cells	gene_0	gene_1	…	gene_y
cell_0	12	3	…	82
cell_1	13	0	…	78
cell_2	35	30	…	170
…	…	…	…	…
cell_x	16	5	…	112

feature_type, a string, either ‘mRNA’ or ‘sgRNA’ or ‘ADT’ or ‘tag’ or ‘CMO’ or ‘ATAC’.

Note

An extra argument ambient_profile is recommended to achieve deeper noise reduction.

ambient_profile represents the probability of occurrence of each ambient transcript and can be empirically estimated by averging cell-free droplets.

genes	ambient profile
gene_0	.0003
gene_1	.00004
gene_2	.00003
…	…
gene_y	.0012

Warning

ambient_profile should sum to one. The gene order should be consistent with raw_count_matrix.

For other optional arguments and parameters, run:

scar --help

The output folder contains four (or five) files:

output
     ├── denoised_counts.pickle
     ├── expected_noise_ratio.pickle
     ├── BayesFactor.pickle
     ├── expected_native_freq.pickle
     └── assignment.pickle

In the folder structure above:

expected_noise_ratio.pickle, estimated noise ratio.
denoised_counts.pickle, denoised count matrix.
BayesFactor.pickle, bayesian factor of ambient contamination.
expected_native_freq.pickle, estimated native frequencies.
assignment.pickle, optional, feature assignment, e.g., sgRNA, tag, and etc..

Tutorials#

Run scar with Python API#

Run scar with the command line tool#

Use .h5 files as the input#

Use .pickle files as the input#

Use `.h5` files as the input#

Use `.pickle` files as the input#