spared.layer_operations.process_dataset
- spared.layer_operations.process_dataset(adata: AnnData, param_dict: dict) AnnData[source]
Perform complete processing pipeline.
This function performs the complete processing pipeline. It only computes over the expression and filters genes to get the final prediction variables. However, it doesn’t perform spot (sample) filtering for which the
filter_dataset()function is recommended. The input dataadata.Xis expected to be in raw counts. The processing pipeline is the following:Normalize the data with TPM normalization (adds
adata.layers['tpm'])Transform the data with logarithmically using \(\log_2(TPM+1)\) (adds
adata.layers['log1p'])Denoise the data with the adaptive median filter (adds
adata.layers['d_log1p'])Compute Moran’s I for each gene in each slide and average Moran’s I across slides (adds
adata.var['d_log1p_moran'])Filter dataset to keep the top
param_dict['top_moran_genes']genes with highest Moran’s I.Perform ComBat batch correction if specified by the
param_dict['combat_key']parameter (addsadata.layers['c_d_log1p'])Compute the deltas from the mean for each gene. Computed from
log1p,d_log1pandc_log1p,c_d_log1player if batch correction was performed (addsdeltas,d_deltas,c_deltas,c_d_deltaslayers)Add a binary mask layer specifying valid observations for metric computation (adds
adata.layers['mask'],Truefor valid observations,Falsefor missing values).
- Parameters:
adata (ad.AnnData) – The AnnData object to process. Should be already spot/sample filtered..
param_dict (dict) –
Dictionary that contains filtering and processing parameters. Keys that must be present are:
'top_moran_genes'(int): The number of genes to keep after filtering by Moran’s I. If set to0, then the number of genes is internally computed.'combat_key'(str): The column inadata.obsthat defines the batches for ComBat batch correction. If set to'None', then no batch correction is performed.'hex_geometry'(bool): Whether the graph is hexagonal or not. IfTrue, then the graph is hexagonal. IfFalse, then the graph is a grid. OnlyTruefor Visium datasets.
- Returns:
The processed AnnData object with all the layers and results added. A list of included keys in
adata.layersis:'counts': Raw counts of the dataset.'tpm': TPM normalized data.'log1p': \(\log_2(TPM+1)\) transformed data.'d_log1p': Denoised data with adaptive median filter.'c_log1p': Batch corrected data with ComBat (only ifparam_dict['combat_key'] != 'None').'c_d_log1p': Batch corrected and denoised data with adaptive median filter (only ifparam_dict['combat_key'] != 'None').'deltas': Deltas from the mean expression forlog1p.'d_deltas': Deltas from the mean expression ford_log1p.'c_deltas': Deltas from the mean expression forc_log1p(only ifparam_dict['combat_key'] != 'None').'c_d_deltas': Deltas from the mean expression forc_d_log1p(only ifparam_dict['combat_key'] != 'None').'mask': Binary mask layer.Truefor valid observations,Falsefor imputed missing values.
- Return type:
ad.Anndata