Setup anndata#
Calculate ambient profile for relevant feature types
- scar.main._setup.setup_anndata(adata: anndata._core.anndata.AnnData, raw_adata: anndata._core.anndata.AnnData, feature_type: Optional[Union[str, list]] = None, prob: float = 0.995, min_raw_counts: int = 2, iterations: int = 3, n_batch: Optional[int] = None, sample: Optional[int] = None, kneeplot: bool = True, verbose: bool = True, figsize: tuple = (6, 6))#
Calculate ambient profile for relevant features
Identify the cell-free droplets through a multinomial distribution. See EmptyDrops [Lun2019] for details.
- Parameters
adata (AnnData) – A filtered adata object, loaded from filtered_feature_bc_matrix using scanpy.read , gene filtering is recommended to save memory
raw_adata (AnnData) – An raw adata object, loaded from raw_feature_bc_matrix using scanpy.read
feature_type (Union[str, list], optional) – Feature type, e.g. ‘Gene Expression’, ‘Antibody Capture’, ‘CRISPR Guide Capture’ or ‘Multiplexing Capture’, all feature types are calculated if None, by default None
prob (float, optional) – The probability of each gene, considered as containing ambient RNA if greater than prob (joint prob euqals to the product of all genes for a droplet), by default 0.995
min_raw_counts (int, optional) – Total counts filter for raw_adata, filtering out low counts to save memory, by default 2
iterations (int, optional) – Total iterations, by default 3
n_batch (int, optional) – Total number of batches, set it to a bigger number when out of memory issue occurs, by default None
sample (int, optional) – Randomly sample droplets to test, if greater than total droplets, use all droplets. Use all droplets by default (None)
kneeplot (bool, optional) – Kneeplot to show subpopulations of droplets, by default True
verbose (bool, optional) – Whether to display message
figsize (tuple, optimal) – Figure size, by default (6, 6)
- Return type
The relevant ambient profile is added in adata.uns
Examples
import scanpy as sc from scar import setup_anndata # read filtered data adata = sc.read_10x_h5(filename='500_hgmm_3p_LT_Chromium_Controller_filtered_feature_bc_matrix.h5ad', backup_url='https://cf.10xgenomics.com/samples/cell-exp/6.1.0/500_hgmm_3p_LT_Chromium_Controller/500_hgmm_3p_LT_Chromium_Controller_filtered_feature_bc_matrix.h5'); adata.var_names_make_unique(); # read raw data adata_raw = sc.read_10x_h5(filename='500_hgmm_3p_LT_Chromium_Controller_raw_feature_bc_matrix.h5ad', backup_url='https://cf.10xgenomics.com/samples/cell-exp/6.1.0/500_hgmm_3p_LT_Chromium_Controller/500_hgmm_3p_LT_Chromium_Controller_raw_feature_bc_matrix.h5'); adata_raw.var_names_make_unique(); # gene and cell filter sc.pp.filter_genes(adata, min_counts=200); sc.pp.filter_genes(adata, max_counts=6000); sc.pp.filter_cells(adata, min_genes=200); # setup anndata setup_anndata( adata, adata_raw, feature_type = "Gene Expression", prob = 0.975, min_raw_counts = 2, kneeplot = True, )