Garfield.preprocessing.preprocessing

Garfield.preprocessing.preprocessing(adata: [<class 'anndata._core.anndata.AnnData'>, <class 'mudata._core.mudata.MuData'>], profile: str = 'RNA', data_type: str = 'Paired', sub_data_type: str = ['rna', 'atac'], batch_key: str = 'batch', weight=0.8, used_hvgs: bool = True, graph_const_method: str | None = None, genome: str | None = None, use_gene_weight: bool = True, use_top_pcs: bool = False, user_cache_path: str | None = None, min_features: int = 600, min_cells: int = 3, target_sum: int | None = None, rna_n_top_features=None, atac_n_top_features=None, n_components: int = 50, svd_components_rna: int = 30, svd_components_atac: int = 30, cca_components: int = 20, cca_max_iter: int = 2000, randomized_svd: bool = False, filter_prop_initial: int = 0, filter_prop_refined: int = 0.3, filter_prop_propagated: int = 0, n_iters: int = 1, svd_runs: int = 1, n: int = 15, metric: str = 'euclidean', svd_solver: str = 'arpack', keep_mt: bool = False, backed: bool = False, verbose: bool = True)[source]

Preprocessing function for single-cell and multi-modal data.

Parameters:
  • adata ([AnnData, MuData]) – The annotated data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to genes.

  • profile (str, optional) – The profile type, by default ‘RNA’.

  • data_type (str, optional) – The data type, by default ‘Paired’.

  • sub_data_type (list of str, optional) – The sub data types, by default [‘rna’, ‘atac’].

  • batch_key (str, optional) – The batch key, by default ‘batch’.

  • weight (float, optional) – The weight for combining adjacency matrices, by default 0.8.

  • used_hvgs (bool, optional) – Whether to use highly variable genes, by default True.

  • graph_const_method (str, optional) – The method for graph construction of spatial data, by default None.

  • genome (str, optional) – The genome reference, by default None.

  • use_gene_weight (bool, optional) – Whether to use gene weight, by default True.

  • use_top_pcs (bool, optional) – Whether to use top principal components, by default False.

  • user_cache_path (str, optional) – The path to save the cache file, by default None.

  • min_features (int, optional) – Minimum number of features, by default 600.

  • min_cells (int, optional) – Minimum number of cells, by default 3.

  • target_sum (int, optional) – Target sum for normalization, by default None.

  • rna_n_top_features (int or list, optional) – Number of top features for RNA, by default None.

  • atac_n_top_features (int or list, optional) – Number of top features for ATAC, by default None.

Return type:

The AnnData object after preprocessing.