screcode package

Submodules

screcode.screcode module

Module contents

class screcode.RECODE(fast_algorithm=True, fast_algorithm_ell_ub=1000, seq_target='RNA', version=1, solver='auto', downsampling_rate=0.2, decimals=5, RECODE_key='RECODE', anndata_key='layers', random_state=0, verbose=True)

Bases: object

RECODE (Resolution of curse of dimensionality in single-cell data analysis). A noise reduction method for single-cell sequencing data.

Parameters:
  • fast_algorithm (boolean, default=True) – If True, the fast algorithm is conducted. The upper bound of parameter \(\ell\) is set by fast_algorithm_ell_ub.

  • fast_algorithm_ell_ub (int, default=1000) – Upper bound of parameter \(\ell\) for the fast algorithm. Must be of range [1,:math:infity).

  • seq_target ({'RNA','ATAC','Hi-C','Multiome'}, default='RNA') – Sequencing target. If ‘ATAC’, the preprocessing (odd-even stabilization) will be performed before the regular algorithm.

  • version (int default='1') – Version of RECODE.

  • solver ({'auto', 'full', 'randomized'}, default="auto") –

    If auto:

    set solver='randomized' if the number of samples (cells) are larger than 20,000. Otherwise set solver='full'.

    If full:

    run learning process using the full input matrix.

    If randomized:

    run learning process involving computing SVD and estimating the essential dimension using downsampled data with the rate downsampling_rate.

  • downsampling_rate (float, default=1000) – Downsampling rate, which is only relevant when solver='randomized'.

  • decimals (int default='5') – Number of decimals for round processed matrices.

  • RECODE_key (string, default='RECODE') – Key name of anndata to store the output.

  • anndata_key ({'layers','obsm'}, default='layers') – Attribute of anndata where the output stored.

  • random_state (int or None, default=0) – Used when the ‘randomized’ solver is used.

  • verbose (boolean, default=True) – If False, all running messages are not displayed.

cv_

Coefficient of variation of features (genes/peaks).

Type:

ndarray of shape (n_features,)

log_

Running log.

Type:

dict

noise_variance_

Noise variances of features (genes/peaks).

Type:

ndarray of shape (n_features,)

normalized_variance_

Variances of features (genes/peaks).

Type:

ndarray of shape (n_features,)

significance_

Significance (significant/non-significant/silent) of features (genes/peaks).

Type:

ndarray of shape (n_features,)

check_applicability(title='', figsize=(10, 5), ps=2, save=False, save_filename='check_applicability', save_format='png', dpi=None, show=True)

Check the applicability of RECODE. Before using this function, you have to conduct fit(X) or fit_transform(X) for the target data matrix X.

Parameters:
  • title (str, default='') – Figure title.

  • figsize (2-tuple of floats, default=(10,5)) – Figure dimension (width, height) in inches.

  • ps (float, default=10,) – Point size.

  • save (bool, default=False) – If True, save the figure.

  • save_filename (str, default= 'check_applicability',) – File name (path) of save figure.

  • save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.

  • dpi (float or None, default=None) – Dots per inch.

fit(X)

Fit the model to X. (Determine the transformation.)

Parameters:

X (ndarray or anndata of shape (n_samples, n_features).) – single-cell sequencing data matrix (row:cell, culumn:gene/peak).

fit_transform(X)

Fit the model with X and transform X into RECODE-denoised data.

Parameters:

X (ndarray/anndata of shape (n_samples, n_features)) – Tranceforming single-cell sequencing data matrix (row:cell, culumn:gene/peak).

Returns:

X_new – Denoised data matrix.

Return type:

ndarray/anndata (the same format as input)

fit_transform_integration(X, meta_data=None, batch_key='batch', integration_method='harmony', integration_method_params={})

Fit the model with X and transform X into RECODE-denoised data.

Parameters:

X (ndarray/anndata of shape (n_samples, n_features)) – Tranceforming single-cell sequencing data matrix (row:cell, culumn:gene/peak).

Returns:

X_new – Denoised data matrix.

Return type:

ndarray/anndata (the same format as input)

lognormalize(X, base=None, target_sum=10000.0, key=None)

Standard normalization: Normalize counts per cell and then logarithmize it: \(x_{ij}^{\rm log} = \log(x_{ij}^{\rm norm} + 1)\), where \(x_{ij}^{\rm norm} = c*x_{ij}/\sum_{i}x_{ij}\) and :math:`x_{ij}’ is the count calue of :math:`i’th cell and :math:`j’th gene.

Parameters:
  • X (ndarray or anndata of shape (n_samples, n_features).) – single-cell sequencing data matrix (row:cell, culumn:gene/peak).

  • base (positive number or None, default=None) – Base of the logarithm. If None, natural logarithm is used.

  • target_sum (float, default=1e4,) – Total value after count normalization, corresponding the coefficient \(c\) above.

  • RECODE_key (string, default=None) – Key name of anndata to store the output. If None, the RECODE_key that is set initially is used.

plot_ATAC_preprocessing(title='ATAC preprocessing', figsize=(7, 5), ps=10, save=False, save_filename='plot_ATAC_preprocessing', save_format='png', dpi=None, show=True)

Plot the number of values in scATAC-seq data matrix with and without preprocessing (odd-even stabilization).

Parameters:
  • title (str, default='ATAC preprocessing') – Figure title.

  • figsize (2-tuple of floats, default=(7,5)) – Figure dimension (width, height) in inches.

  • ps (float, default=10,) – Point size.

  • save (bool, default=False) – If True, save the figure.

  • save_filename (str, default= 'plot_ATAC_preprocessing',) – File name (path) of save figure.

  • save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.

  • dpi (float or None, default=None) – Dots per inch.

plot_denoised_data(title='', figsize=(7, 5), save=False, save_filename='noise_variance', save_format='png', dpi=None, show=True)

Plot variances of the denoised data.

Parameters:
  • title (str, default='') – Figure title.

  • figsize (2-tuple of floats, default=(7,5)) – Figure dimension (width, height) in inches.

  • ps (float, default=10,) – Point size.

  • save (bool, default=False) – If True, save the figure.

  • save_filename (str, default= 'noise_variance',) – File name (path) of save figure.

  • save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.

  • dpi (float or None, default=None) – Dots per inch.

plot_mean_cv(titles=('Original', 'RECODE'), figsize=(7, 5), ps=2, save=False, save_filename='plot_mean_cv', save_format='png', dpi=None, show_features=False, n_show_features=10, cut_detect_rate=0.005, index=None, show=True)

Plot mean vs variance of features for log-normalized data

Parameters:
  • title (str, default='') – Figure title.

  • figsize (2-tuple of floats, default=(7,5)) – Figure dimension (width, height) in inches.

  • ps (float, default=10,) – Point size.

  • save (bool, default=False) – If True, save the figure.

  • save_filename (str, default= 'check_applicability',) – File name (path) of save figure.

  • save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.

  • dpi (float or None, default=None) – Dots per inch.

  • show_features (float or None, default=False,) – If True

  • n_show_features (float, default=10,)

  • cut_detect_rate (float, default=0.005,)

  • index (array-like of shape (n_features,) or None, default=None,)

plot_mean_variance(titles=('Original', 'RECODE'), figsize=(7, 5), ps=2, size_factor='median', save=False, save_filename='plot_mean_variance', save_format='png', dpi=None, show=True)

Plot mean vs variance of features for log-normalized data

Parameters:
  • titles (tuple, default=('Original','RECODE')) – Figure title.

  • figsize (2-tuple of floats, default=(7,5)) – Figure dimension (width, height) in inches.

  • ps (float, default=10,) – Point size.

  • size_factor (float or {'median','mean'}, default='median',) – Size factor (total count constant of each cell before the log-normalization).

  • save (bool, default=False) – If True, save the figure.

  • save_filename (str, default= 'check_applicability',) – File name (path) of save figure.

  • save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.

  • dpi (float or None, default=None) – Dots per inch.

plot_normalized_data(title='Normalized data', figsize=(7, 5), save=False, save_filename='noise_variance', save_format='png', dpi=None, show=True)

Plot the transformed data by the noise variance-stabilizing normalization.

Parameters:
  • title (str, default='') – Figure title.

  • figsize (2-tuple of floats, default=(7,5)) – Figure dimension (width, height) in inches.

  • ps (float, default=10,) – Point size.

  • save (bool, default=False) – If True, save the figure.

  • save_filename (str, default= 'noise_variance',) – File name (path) of save figure.

  • save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.

  • dpi (float or None, default=None) – Dots per inch.

plot_original_data(title='', figsize=(7, 5), save=False, save_filename='original_data', save_format='png', dpi=None, show=True)

Plot noise variance for each features.

Parameters:
  • title (str, default='') – Figure title.

  • figsize (2-tuple of floats, default=(7,5)) – Figure dimension (width, height) in inches.

  • ps (float, default=10,) – Point size.

  • save (bool, default=False) – If True, save the figure.

  • save_filename (str, default='original_data',) – File name (path) of save figure.

  • save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.

  • dpi (float or None, default=None) – Dots per inch.

plot_procedures(titles=('Original data', 'Normalized data', 'Projected data', 'Variance-modified data', 'Denoised data'), figsize=(7, 5), save=False, save_filename='RECODE_procedures', save_filename_foots=('1_Original', '2_Normalized', '3_Projected', '4_Variance-modified', '5_Denoised'), save_format='png', dpi=None, show=True)

Plot procedures of RECODE. The vertical axes of feature are sorted by the mean.

Parameters:
  • titles (5-tuple of str, default=('Original data','Normalized data','Projected data','Variance-modified data','Denoised data')) – Figure titles.

  • figsize (2-tuple of floats, default=(7,5)) – Figure dimension (width, height) in inches.

  • save (bool, default=False) – If True, save the figure.

  • save_filename (str, default='RECODE_procedures',) – File name (path) of save figure (head).

  • save_filename_foots (5-tuple of str, default=('1_Original','2_Normalized','3_Projected','4_Variance-modified','5_Denoised'),) – File name (path) of save figure (foot).

  • save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.

  • dpi (float or None, default=None) – Dots per inch.

plot_projected_data(title='Projected data', figsize=(7, 5), save=False, save_filename='noise_variance', save_format='png', dpi=None, show=True)

Plot projected data.

Parameters:
  • title (str, default='') – Figure title.

  • figsize (2-tuple of floats, default=(7,5)) – Figure dimension (width, height) in inches.

  • ps (float, default=10,) – Point size.

  • save (bool, default=False) – If True, save the figure.

  • save_filename (str, default= 'noise_variance',) – File name (path) of save figure.

  • save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.

  • dpi (float or None, default=None) – Dots per inch.

plot_variance_modified_data(title='Variance-modified data', figsize=(7, 5), save=False, save_filename='noise_variance', save_format='png', dpi=None, show=True)

Plot varainces (eigenvalues) of the variance-modified data.

Parameters:
  • title (str, default='') – Figure title.

  • figsize (2-tuple of floats, default=(7,5)) – Figure dimension (width, height) in inches.

  • ps (float, default=10,) – Point size.

  • save (bool, default=False) – If True, save the figure.

  • save_filename (str, default= 'noise_variance',) – File name (path) of save figure.

  • save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.

  • dpi (float or None, default=None) – Dots per inch.

report(figsize=(8.27, 11.69), save=False, save_filename='report', save_format='png', dpi=None, show=True)

Check the applicability of RECODE. Before using this function, you have to conduct fit(X) or fit_transform(X) for the target data matrix X.

Parameters:
  • title (str, default='') – Figure title.

  • figsize (2-tuple of floats, default=(8.27,11.69)) – Figure dimension (width, height) in inches.

  • save (bool, default=False) – If True, save the figure.

  • save_filename (str, default= 'report',) – File name (path) of save figure.

  • save_format ({'png', 'pdf', 'svg'}, default= 'png',) – File format of save figure.

  • dpi (float or None, default=None) – Dots per inch.

transform(X)

Transform X into RECODE-denoised data.

Parameters:

X (ndarray or anndata of shape (n_samples, n_features)) – Single-cell sequencing data matrix (row:cell, culumn:gene/peak).

Returns:

X_new – RECODE-denoised data matrix.

Return type:

ndarray of shape (n_samples, n_features)

transform_integration(X, meta_data=None, batch_key='batch', integration_method='harmony', integration_method_params={})

Transform X into RECODE-denoised data.

Parameters:
  • X (ndarray or anndata of shape (n_samples, n_features)) – Single-cell sequencing data matrix (row:cell, culumn:gene/peak).

  • meta_data (ndarray (n_samples, 1) or DataFrame (n_samples, *))

  • batch_key (string, default='batch') – Key name in meta_data denoting batch.

Returns:

X_new – RECODE-denoised data matrix.

Return type:

ndarray of shape (n_samples, n_features)

class screcode.RECODE_core(method='variance', variance_estimate=True, fast_algorithm=True, fast_algorithm_ell_ub=1000, ell_manual=10, ell_min=3, version=1, random_state=0, verbose=True)

Bases: object

The core part of RECODE (for non-randam sampling data).

Parameters:
  • method ({'variance','manual'}) – If ‘variance’, regular variance-based algorithm. If ‘manual’, parameter \(\ell\), which identifies essential and noise parts in the PCA space, is manually set. The manual parameter is given by ell_manual.

  • variance_estimate (boolean, default=True) – If True and method='variance', the parameter estimation method will be done.

  • fast_algorithm (boolean, default=True) – If True, the fast algorithm is done. The upper bound of essential dimension \(\ell\) is set in fast_algorithm_ell_ub.

  • fast_algorithm_ell_ub (int, default=1000) – Upper bound of parameter \(\ell\) for the fast algorithm. Must be of range [1, infinity).

  • ell_manual (int, default=10) – Manual essential dimension \(\ell\) computed by method='manual'. Must be of range [1, infinity).

  • ell_min (int, default=3) – Minimam value of essential dimension \(\ell\)

  • version (int default='1') – Version of RECODE.

fit(X)

Create the transformation using X.

Parameters:

X (ndarray of shape (n_samples, n_features).) – Training data matrix, where n_samples is the number of samples and n_features is the number of features.

Returns:

self – Returns the instance itself.

Return type:

object

fit_transform(X)

Fit and transform RECODE to X.

Parameters:

X (ndarray of shape (n_samples, n_features).) – Transsforming data matrix, where n_samples is the number of samples and n_features is the number of features.

Returns:

X_new – Denoised data matrix.

Return type:

ndarray of shape (n_samples, n_components)

transform(X, return_ess=False)

Apply RECODE to X.

Parameters:

X (ndarray of shape (n_samples, n_features).) – Transsforming data matrix, where n_samples is the number of samples and n_features is the number of features.

Returns:

X_new – Denoised data matrix.

Return type:

ndarray of shape (n_samples, n_components)