Setup anndata#

Calculate ambient profile for relevant feature types

scar.main._setup.setup_anndata(adata: anndata._core.anndata.AnnData, raw_adata: anndata._core.anndata.AnnData, feature_type: Optional[Union[str, list]] = None, prob: float = 0.995, min_raw_counts: int = 2, iterations: int = 3, n_batch: Optional[int] = None, sample: Optional[int] = None, kneeplot: bool = True, verbose: bool = True, figsize: tuple = (6, 6))#

Calculate ambient profile for relevant features

Identify the cell-free droplets through a multinomial distribution. See EmptyDrops [Lun2019] for details.

Parameters

adata (AnnData) – A filtered adata object, loaded from filtered_feature_bc_matrix using scanpy.read , gene filtering is recommended to save memory
raw_adata (AnnData) – An raw adata object, loaded from raw_feature_bc_matrix using scanpy.read
feature_type (Union[str, list], optional) – Feature type, e.g. ‘Gene Expression’, ‘Antibody Capture’, ‘CRISPR Guide Capture’ or ‘Multiplexing Capture’, all feature types are calculated if None, by default None
prob (float, optional) – The probability of each gene, considered as containing ambient RNA if greater than prob (joint prob euqals to the product of all genes for a droplet), by default 0.995
min_raw_counts (int, optional) – Total counts filter for raw_adata, filtering out low counts to save memory, by default 2
iterations (int, optional) – Total iterations, by default 3
n_batch (int, optional) – Total number of batches, set it to a bigger number when out of memory issue occurs, by default None
sample (int, optional) – Randomly sample droplets to test, if greater than total droplets, use all droplets. Use all droplets by default (None)
kneeplot (bool, optional) – Kneeplot to show subpopulations of droplets, by default True
verbose (bool, optional) – Whether to display message
figsize (tuple, optimal) – Figure size, by default (6, 6)

Return type

The relevant ambient profile is added in adata.uns

Examples

import scanpy as sc
from scar import setup_anndata
# read filtered data
adata = sc.read_10x_h5(filename='500_hgmm_3p_LT_Chromium_Controller_filtered_feature_bc_matrix.h5ad',
                     backup_url='https://cf.10xgenomics.com/samples/cell-exp/6.1.0/500_hgmm_3p_LT_Chromium_Controller/500_hgmm_3p_LT_Chromium_Controller_filtered_feature_bc_matrix.h5');
adata.var_names_make_unique();
# read raw data
adata_raw = sc.read_10x_h5(filename='500_hgmm_3p_LT_Chromium_Controller_raw_feature_bc_matrix.h5ad',
                     backup_url='https://cf.10xgenomics.com/samples/cell-exp/6.1.0/500_hgmm_3p_LT_Chromium_Controller/500_hgmm_3p_LT_Chromium_Controller_raw_feature_bc_matrix.h5');
adata_raw.var_names_make_unique();
# gene and cell filter
sc.pp.filter_genes(adata, min_counts=200);
sc.pp.filter_genes(adata, max_counts=6000);
sc.pp.filter_cells(adata, min_genes=200);
# setup anndata
setup_anndata(
    adata,
    adata_raw,
    feature_type = "Gene Expression",
    prob = 0.975,
    min_raw_counts = 2,
    kneeplot = True,
)