Tutorials#
There are two ways to run scar
. For Python users, we recommend the Python API; for R users, we recommend the command line tool.
Run scar with Python API#
Run scar with the command line tool#
The command line tool supports two formats of input.
Use .h5
files as the input#
We can use the output of cellranger count filtered_feature_bc_matrix.h5 as the input for scar
:
scar filtered_feature_bc_matrix.h5 -ft feature_type -o output
filtered_feature_bc_matrix.h5
, a filtered .h5 file produced by cellranger count.
feature_type
, a string, either ‘mRNA’ or ‘sgRNA’ or ‘ADT’ or ‘tag’ or ‘CMO’ or ‘ATAC’.
Note
The ambient profile is calculated by averaging the cell pool under this mode. If you want to use a more accurate ambient profile, please consider calculating it and using
.pickle
files as the input, as detailed below.
The output folder contains an h5ad file:
output
└── filtered_feature_bc_matrix_denoised_feature_type.h5ad
The h5ad file can be read by scanpy.read as an anndata object:
anndata.X, denosed counts.
anndata.obs[’
noise_ratio
’], estimated noise ratio per cell.anndata.layers[’
native_frequencies
’], estimated native frequencies.anndata.layers[’
BayesFactor
’], bayesian factor of ambient contamination.anndata.obs[’
sgRNAs
’ or ‘tags
’], optional, feature assignment, e.g., sgRNA, tag, CMO, and etc..
Use .pickle
files as the input#
We can also run scar
by:
scar raw_count_matrix.pickle -ft feature_type -o output
raw_count_matrix.pickle
, a file of raw count matrix (MxN) with cells in rows and features in columns.
cells |
gene_0 |
gene_1 |
… |
gene_y |
---|---|---|---|---|
cell_0 |
12 |
3 |
… |
82 |
cell_1 |
13 |
0 |
… |
78 |
cell_2 |
35 |
30 |
… |
170 |
… |
… |
… |
… |
… |
cell_x |
16 |
5 |
… |
112 |
feature_type
, a string, either ‘mRNA’ or ‘sgRNA’ or ‘ADT’ or ‘tag’ or ‘CMO’ or ‘ATAC’.
Note
An extra argument
ambient_profile
is recommended to achieve deeper noise reduction.
ambient_profile
represents the probability of occurrence of each ambient transcript and can be empirically estimated by averging cell-free droplets.
genes |
ambient profile |
---|---|
gene_0 |
.0003 |
gene_1 |
.00004 |
gene_2 |
.00003 |
… |
… |
gene_y |
.0012 |
Warning
ambient_profile
should sum to one. The gene order should be consistent with raw_count_matrix
.
For other optional arguments and parameters, run:
scar --help
The output folder contains four (or five) files:
output
├── denoised_counts.pickle
├── expected_noise_ratio.pickle
├── BayesFactor.pickle
├── expected_native_freq.pickle
└── assignment.pickle
In the folder structure above:
expected_noise_ratio.pickle
, estimated noise ratio.denoised_counts.pickle
, denoised count matrix.BayesFactor.pickle
, bayesian factor of ambient contamination.expected_native_freq.pickle
, estimated native frequencies.assignment.pickle
, optional, feature assignment, e.g., sgRNA, tag, and etc..