RECODE
- class screcode.RECODE(fast_algorithm=True, fast_algorithm_ell_ub=1000, seq_target='RNA', version=1, solver='auto', downsampling_rate=0.2, decimals=5, RECODE_key='RECODE', anndata_key='layers', random_state=0, verbose=True)
Bases:
object
RECODE (Resolution of curse of dimensionality in single-cell data analysis). A noise reduction method for single-cell sequencing data.
- Parameters:
fast_algorithm (boolean, default=True) – If True, the fast algorithm is conducted. The upper bound of parameter ell is set by
fast_algorithm_ell_ub
.fast_algorithm_ell_ub (int, default=1000) – Upper bound of parameter ell for the fast algorithm. Must be of range [1,infinity).
seq_target ({'RNA','ATAC','Hi-C','Multiome'}, default='RNA') – Sequencing target. If ‘ATAC’, the preprocessing (odd-even stabilization) will be performed before the regular algorithm.
version (int default='1') – Version of RECODE.
solver ({'auto', 'full', 'randomized'}, default="auto") –
- If auto:
set
solver='randomized'
if the number of samples (cells) are larger than 20,000. Otherwise setsolver='full'
.- If full:
run learning process using the full input matrix.
- If randomized:
run learning process involving computing SVD and estimating the essential dimension using downsampled data with the rate
downsampling_rate
.
downsampling_rate (float, default=1000) – Downsampling rate, which is only relevant when
solver='randomized'
.decimals (int default='5') – Number of decimals for round processed matrices.
RECODE_key (string, default='RECODE') – Key name of anndata to store the output.
anndata_key ({'layers','obsm'}, default='layers') – Attribute of anndata where the output stored.
random_state (int or None, default=0) – Used when the ‘randomized’ solver is used.
verbose (boolean, default=True) – If False, all running messages are not displayed.
- cv_
Coefficient of variation of features (genes/peaks).
- Type:
ndarray of shape (n_features,)
- log_
Running log.
- Type:
dict
- noise_variance_
Noise variances of features (genes/peaks).
- Type:
ndarray of shape (n_features,)
- normalized_variance_
Variances of features (genes/peaks).
- Type:
ndarray of shape (n_features,)
- significance_
Significance (significant/non-significant/silent) of features (genes/peaks).
- Type:
ndarray of shape (n_features,)
- check_applicability(title='', figsize=(10, 5), ps=2, save=False, save_filename='check_applicability', save_format='png', dpi=None, show=True)
Check the applicability of RECODE. Before using this function, you have to conduct
fit(X)
orfit_transform(X)
for the target data matrixX
.- Parameters:
title (str, default='') – Figure title.
figsize (2-tuple of floats, default=(10,5)) – Figure dimension
(width, height)
in inches.ps (float, default=10,) – Point size.
save (bool, default=False) – If True, save the figure.
save_filename (str, default= 'check_applicability',) – File name (path) of save figure.
save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.
dpi (float or None, default=None) – Dots per inch.
- fit(X)
Fit the model to X. (Determine the transformation.)
- Parameters:
X (ndarray or anndata of shape (n_samples, n_features).) – single-cell sequencing data matrix (row:cell, culumn:gene/peak).
- fit_transform(X)
Fit the model with X and transform X into RECODE-denoised data.
- Parameters:
X (ndarray/anndata of shape (n_samples, n_features)) – Tranceforming single-cell sequencing data matrix (row:cell, culumn:gene/peak).
- Returns:
X_new – Denoised data matrix.
- Return type:
ndarray/anndata (the same format as input)
- fit_transform_integration(X, meta_data=None, batch_key='batch', integration_method='harmony', integration_method_params={})
Fit the model with X and transform X into RECODE-denoised data.
- Parameters:
X (ndarray/anndata of shape (n_samples, n_features)) – Tranceforming single-cell sequencing data matrix (row:cell, culumn:gene/peak).
integration_method ({'harmony','mnn','scanorama','scvi'}, default='harmony') – A batch correction method used in iRECODE.
- Returns:
X_new – Denoised data matrix.
- Return type:
ndarray/anndata (the same format as input)
- lognormalize(X, base=None, target_sum=10000.0, key=None)
Standard normalization: Normalize counts per cell and then logarithmize it: \(x_{ij}^{\rm log} = \log(x_{ij}^{\rm norm} + 1)\), where \(x_{ij}^{\rm norm} = c*x_{ij}/\sum_{i}x_{ij}\) and :math:`x_{ij}’ is the count calue of :math:`i’th cell and :math:`j’th gene.
- Parameters:
X (ndarray or anndata of shape (n_samples, n_features).) – single-cell sequencing data matrix (row:cell, culumn:gene/peak).
base (positive number or None, default=None) – Base of the logarithm. If None, natural logarithm is used.
target_sum (float, default=1e4,) – Total value after count normalization, corresponding the coefficient \(c\) above.
RECODE_key (string, default=None) – Key name of anndata to store the output. If None, the RECODE_key that is set initially is used.
- plot_ATAC_preprocessing(title='ATAC preprocessing', figsize=(7, 5), ps=10, save=False, save_filename='plot_ATAC_preprocessing', save_format='png', dpi=None, show=True)
Plot the number of values in scATAC-seq data matrix with and without preprocessing (odd-even stabilization).
- Parameters:
title (str, default='ATAC preprocessing') – Figure title.
figsize (2-tuple of floats, default=(7,5)) – Figure dimension
(width, height)
in inches.ps (float, default=10,) – Point size.
save (bool, default=False) – If True, save the figure.
save_filename (str, default= 'plot_ATAC_preprocessing',) – File name (path) of save figure.
save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.
dpi (float or None, default=None) – Dots per inch.
- plot_denoised_data(title='', figsize=(7, 5), save=False, save_filename='noise_variance', save_format='png', dpi=None, show=True)
Plot variances of the denoised data.
- Parameters:
title (str, default='') – Figure title.
figsize (2-tuple of floats, default=(7,5)) – Figure dimension
(width, height)
in inches.ps (float, default=10,) – Point size.
save (bool, default=False) – If True, save the figure.
save_filename (str, default= 'noise_variance',) – File name (path) of save figure.
save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.
dpi (float or None, default=None) – Dots per inch.
- plot_mean_cv(titles=('Original', 'RECODE'), figsize=(7, 5), ps=2, save=False, save_filename='plot_mean_cv', save_format='png', dpi=None, show_features=False, n_show_features=10, cut_detect_rate=0.005, index=None, show=True)
Plot mean vs variance of features for log-normalized data
- Parameters:
title (str, default='') – Figure title.
figsize (2-tuple of floats, default=(7,5)) – Figure dimension
(width, height)
in inches.ps (float, default=10,) – Point size.
save (bool, default=False) – If True, save the figure.
save_filename (str, default= 'check_applicability',) – File name (path) of save figure.
save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.
dpi (float or None, default=None) – Dots per inch.
show_features (float or None, default=False,) – If True
n_show_features (float, default=10,)
cut_detect_rate (float, default=0.005,)
index (array-like of shape (n_features,) or None, default=None,)
- plot_mean_variance(titles=('Original', 'RECODE'), figsize=(7, 5), ps=2, size_factor='median', save=False, save_filename='plot_mean_variance', save_format='png', dpi=None, show=True)
Plot mean vs variance of features for log-normalized data
- Parameters:
titles (tuple, default=('Original','RECODE')) – Figure title.
figsize (2-tuple of floats, default=(7,5)) – Figure dimension
(width, height)
in inches.ps (float, default=10,) – Point size.
size_factor (float or {'median','mean'}, default='median',) – Size factor (total count constant of each cell before the log-normalization).
save (bool, default=False) – If True, save the figure.
save_filename (str, default= 'check_applicability',) – File name (path) of save figure.
save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.
dpi (float or None, default=None) – Dots per inch.
- plot_normalized_data(title='Normalized data', figsize=(7, 5), save=False, save_filename='noise_variance', save_format='png', dpi=None, show=True)
Plot the transformed data by the noise variance-stabilizing normalization.
- Parameters:
title (str, default='') – Figure title.
figsize (2-tuple of floats, default=(7,5)) – Figure dimension
(width, height)
in inches.ps (float, default=10,) – Point size.
save (bool, default=False) – If True, save the figure.
save_filename (str, default= 'noise_variance',) – File name (path) of save figure.
save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.
dpi (float or None, default=None) – Dots per inch.
- plot_original_data(title='', figsize=(7, 5), save=False, save_filename='original_data', save_format='png', dpi=None, show=True)
Plot noise variance for each features.
- Parameters:
title (str, default='') – Figure title.
figsize (2-tuple of floats, default=(7,5)) – Figure dimension
(width, height)
in inches.ps (float, default=10,) – Point size.
save (bool, default=False) – If True, save the figure.
save_filename (str, default='original_data',) – File name (path) of save figure.
save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.
dpi (float or None, default=None) – Dots per inch.
- plot_procedures(titles=('Original data', 'Normalized data', 'Projected data', 'Variance-modified data', 'Denoised data'), figsize=(7, 5), save=False, save_filename='RECODE_procedures', save_filename_foots=('1_Original', '2_Normalized', '3_Projected', '4_Variance-modified', '5_Denoised'), save_format='png', dpi=None, show=True)
Plot procedures of RECODE. The vertical axes of feature are sorted by the mean.
- Parameters:
titles (5-tuple of str, default=('Original data','Normalized data','Projected data','Variance-modified data','Denoised data')) – Figure titles.
figsize (2-tuple of floats, default=(7,5)) – Figure dimension
(width, height)
in inches.save (bool, default=False) – If True, save the figure.
save_filename (str, default='RECODE_procedures',) – File name (path) of save figure (head).
save_filename_foots (5-tuple of str, default=('1_Original','2_Normalized','3_Projected','4_Variance-modified','5_Denoised'),) – File name (path) of save figure (foot).
save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.
dpi (float or None, default=None) – Dots per inch.
- plot_projected_data(title='Projected data', figsize=(7, 5), save=False, save_filename='noise_variance', save_format='png', dpi=None, show=True)
Plot projected data.
- Parameters:
title (str, default='') – Figure title.
figsize (2-tuple of floats, default=(7,5)) – Figure dimension
(width, height)
in inches.ps (float, default=10,) – Point size.
save (bool, default=False) – If True, save the figure.
save_filename (str, default= 'noise_variance',) – File name (path) of save figure.
save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.
dpi (float or None, default=None) – Dots per inch.
- plot_variance_modified_data(title='Variance-modified data', figsize=(7, 5), save=False, save_filename='noise_variance', save_format='png', dpi=None, show=True)
Plot varainces (eigenvalues) of the variance-modified data.
- Parameters:
title (str, default='') – Figure title.
figsize (2-tuple of floats, default=(7,5)) – Figure dimension
(width, height)
in inches.ps (float, default=10,) – Point size.
save (bool, default=False) – If True, save the figure.
save_filename (str, default= 'noise_variance',) – File name (path) of save figure.
save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.
dpi (float or None, default=None) – Dots per inch.
- report(figsize=(8.27, 11.69), save=False, save_filename='report', save_format='png', dpi=None, show=True)
Check the applicability of RECODE. Before using this function, you have to conduct
fit(X)
orfit_transform(X)
for the target data matrixX
.- Parameters:
title (str, default='') – Figure title.
figsize (2-tuple of floats, default=(8.27,11.69)) – Figure dimension
(width, height)
in inches.save (bool, default=False) – If True, save the figure.
save_filename (str, default= 'report',) – File name (path) of save figure.
save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.
dpi (float or None, default=None) – Dots per inch.
- transform(X)
Transform X into RECODE-denoised data.
- Parameters:
X (ndarray or anndata of shape (n_samples, n_features)) – Single-cell sequencing data matrix (row:cell, culumn:gene/peak).
- Returns:
X_new – RECODE-denoised data matrix.
- Return type:
ndarray of shape (n_samples, n_features)
- transform_integration(X, meta_data=None, batch_key='batch', integration_method='harmony', integration_method_params={})
Transform X into RECODE-denoised data.
- Parameters:
X (ndarray or anndata of shape (n_samples, n_features)) – Single-cell sequencing data matrix (row:cell, culumn:gene/peak).
meta_data (ndarray (n_samples, 1) or DataFrame (n_samples, *))
batch_key (string or list, default='batch') – Key name(s) in
meta_data
denoting batch.integration_method ({'harmony','mnn','scanorama','scvi'}, default='harmony') – A batch correction method used in iRECODE.
- Returns:
X_new – RECODE-denoised data matrix.
- Return type:
ndarray of shape (n_samples, n_features)
- class screcode.RECODE_core(method='variance', variance_estimate=True, fast_algorithm=True, fast_algorithm_ell_ub=1000, ell_manual=10, ell_min=3, version=1, random_state=0, verbose=True)
Bases:
object
The core part of RECODE (for non-randam sampling data).
- Parameters:
method ({'variance','manual'}) – If ‘variance’, regular variance-based algorithm. If ‘manual’, parameter ell, which identifies essential and noise parts in the PCA space, is manually set. The manual parameter is given by
ell_manual
.variance_estimate (boolean, default=True) – If True and
method='variance'
, the parameter estimation method will be done.fast_algorithm (boolean, default=True) – If True, the fast algorithm is done. The upper bound of essential dimension ell is set in
fast_algorithm_ell_ub
.fast_algorithm_ell_ub (int, default=1000) – Upper bound of parameter ell for the fast algorithm. Must be of range [1, infinity).
ell_manual (int, default=10) – Manual essential dimension ell computed by
method='manual'
. Must be of range [1, infinity).ell_min (int, default=3) – Minimam value of essential dimension ell
version (int default='1') – Version of RECODE.
- fit(X)
Create the transformation using X.
- Parameters:
X (ndarray of shape (n_samples, n_features).) – Training data matrix, where
n_samples
is the number of samples andn_features
is the number of features.- Returns:
self – Returns the instance itself.
- Return type:
object
- fit_transform(X)
Fit and transform RECODE to X.
- Parameters:
X (ndarray of shape (n_samples, n_features).) – Transsforming data matrix, where n_samples is the number of samples and n_features is the number of features.
- Returns:
X_new – Denoised data matrix.
- Return type:
ndarray of shape (n_samples, n_components)
- transform(X, return_ess=False)
Apply RECODE to X.
- Parameters:
X (ndarray of shape (n_samples, n_features).) – Transsforming data matrix, where n_samples is the number of samples and n_features is the number of features.
- Returns:
X_new – Denoised data matrix.
- Return type:
ndarray of shape (n_samples, n_components)