sccoral.model.SCCORAL#
- class sccoral.model.SCCORAL(adata, n_latent=10, alpha_l1=1000, n_hidden=128, n_layers=1, dropout_rate=0.1, dispersion='gene', log_variational=True, latent_distribution='ln', gene_likelihood='nb', use_batch_norm='both', use_layer_norm=False, use_observed_lib_size=True, **vae_kwargs)#
Single-cell COvariate-informed Regularized variational Autoencoder with Linear Decoder
- Parameters:
adata (
AnnData) – Registered AnnData objectn_latent (
int(default:10)) – Number of latent dimensions, approximated dimensionality of datasetalpha_l1 (
Tunable_[float] (default:1000)) – Regularization strength in decodern_hidden (
Tunable_[int] (default:128)) – Number of hidden layers in encodern_layers (
Tunable_[int] (default:1)) – Number of layers in encoder neural network (see LSCVI)dropout_rate (
Tunable_[float] (default:0.1)) – Dropout rate for neural networks (see LSCVI)dispersion (
Literal['gene','gene-batch','gene-cell'] (default:'gene')) – Whether dispersion parameters of genes are fit on the level of 1) datasets 2) batches 3) cells (not implemented: labels)log_variational (
bool(default:True)) – Whether to log(x+1) counts x during encodinglatent_distribution (
Literal['normal','ln'] (default:'ln')) – Prior on latent spacegene_likelihood (
Tunable_[Literal['nb','zinb','poisson']] (default:'nb')) –One of (see scVI/LSCVI)
nb- Negative binomial distributionzinb- Zero inflated negative binomial distributionpoisson- Poisson distribution
use_batch_norm (
Literal['encoder','decoder','both','none'] (default:'both')) – Batch norm in encoder/decoderuse_layer_norm (
bool(default:False)) – Layer norm in encoder**model_kwargs – Keyword arguments for
_module
Examples
>>> adata = sccoral.data.simulation_dataset() >>> sccoral.model.setup_anndata(adata, categorical_covariate='categorical_covariate', continuous_covariate='continuous_covariate' ) >>> m = sccoral.model(adata, n_latent=7) >>> m.train() >>> representation = m.get_latent_representation() # pd.DataFrame cells x n_latent >>> loadings = m.get_loadings() # pd.DataFrame genes x n_latent >>> r2 = m.get_explained_variance_per_factor() # pd.DataFrame 1 x n_latent
Notes
Upcoming documentation 1. :doc:
References
[].
Attributes table#
Data attached to model instance. |
|
Manager instance associated with self.adata. |
|
The current device that the module's params are on. |
|
Returns computed metrics during training. |
|
Whether the model has been trained. |
|
Summary string of the model. |
|
Observations that are in test set. |
|
Observations that are in train set. |
|
Observations that are in validation set. |
Methods table#
|
Converts a legacy saved model (<v0.15.0) to the updated save format. |
|
Deregisters the |
|
Retrieves the |
|
Return the ELBO for the data. |
|
Compute explained variance per factor |
|
Returns the object in AnnData associated with the key in the data registry. |
|
Get latent representation of cells in anndata object |
|
Extract linear weights of decoder |
|
Return the marginal LL for the data. |
|
Return the reconstruction error for the data. |
|
Instantiate a model from the saved output. |
|
Return the full registry saved with the model. |
|
Registers an |
|
Save the state of the model. |
|
Sets up the |
|
Move model to device. |
|
Train sccoral model |
|
Print summary of the setup for the initial AnnData or a given AnnData object. |
|
Print args used to setup a saved model. |
Attributes#
- SCCORAL.adata#
Data attached to model instance.
- SCCORAL.adata_manager#
Manager instance associated with self.adata.
- SCCORAL.device#
The current device that the module’s params are on.
- SCCORAL.history#
Returns computed metrics during training.
- SCCORAL.is_trained#
Whether the model has been trained.
- SCCORAL.summary_string#
Summary string of the model.
- SCCORAL.test_indices#
Observations that are in test set.
- SCCORAL.train_indices#
Observations that are in train set.
- SCCORAL.validation_indices#
Observations that are in validation set.
Methods#
- classmethod SCCORAL.convert_legacy_save(dir_path, output_dir_path, overwrite=False, prefix=None, **save_kwargs)#
Converts a legacy saved model (<v0.15.0) to the updated save format.
- Parameters:
dir_path (
str) – Path to directory where legacy model is saved.output_dir_path (
str) – Path to save converted save files.overwrite (
bool(default:False)) – Overwrite existing data or not. IfFalseand directory already exists atoutput_dir_path, error will be raised.prefix (
str|None(default:None)) – Prefix of saved file names.**save_kwargs – Keyword arguments passed into
save().
- Return type:
- SCCORAL.deregister_manager(adata=None)#
Deregisters the
AnnDataManagerinstance associated withadata.If
adataisNone, deregisters allAnnDataManagerinstances in both the class and instance-specific manager stores, except for the one associated with this model instance.
- SCCORAL.get_anndata_manager(adata, required=False)#
Retrieves the
AnnDataManagerfor a given AnnData object.Requires
self.idhas been set. Checks for anAnnDataManagerspecific to this model instance.
- SCCORAL.get_elbo(adata=None, indices=None, batch_size=None)#
Return the ELBO for the data.
The ELBO is a lower bound on the log likelihood of the data used for optimization of VAEs. Note, this is not the negative ELBO, higher is better.
- Parameters:
adata (
Optional[AnnData] (default:None)) – AnnData object with equivalent structure to initial AnnData. IfNone, defaults to the AnnData object used to initialize the model.indices (
Optional[Sequence[int]] (default:None)) – Indices of cells in adata to use. IfNone, all cells are used.batch_size (
Optional[int] (default:None)) – Minibatch size for data loading into model. Defaults toscvi.settings.batch_size.
- Return type:
- SCCORAL.get_explained_variance_per_factor(adata, set_column_names=False)#
Compute explained variance per factor
- SCCORAL.get_from_registry(adata, registry_key)#
Returns the object in AnnData associated with the key in the data registry.
AnnData object should be registered with the model prior to calling this function via the
self._validate_anndatamethod.
- SCCORAL.get_latent_representation(adata=None, indices=None, give_mean=True, mc_samples=5000, batch_size=None, return_dist=False, set_column_names=True, suffix=None)#
Get latent representation of cells in anndata object
- Parameters:
adata (
AnnData|None(default:None)) – AnnData object to embed. IfNoneuse storedanndata.AnnDataindices (
Iterable[int] |None(default:None)) – Indices of cells to retrieve (see scvi-tools)give_mean (
bool(default:True)) – Whether to give the full distribution or mean of distribution. Defaults to mean See scvi-toolsmc_samples (
int(default:5000)) – For distributions with no closed analytical solution - how many samples to draw (see scvi-tools)batch_size (
int|None(default:None)) – Batch size during inference.return_dist (
bool(default:False)) – Whether to return single-measurement values (False) or parameters of the distribution (True) See scvi-toolsset_column_names (
bool(default:True)) – Whether to set the column names to covariate names (defaults to True)suffix (
str|None(default:None)) – Whether to add a suffix (e.g.__factor) so that columns in dataframe are better distinguishable from metadata info. Per default, no suffix is added.
- Return type:
- Returns:
Pandas DataFrame
n_cellsxn_latent
- SCCORAL.get_loadings(set_column_names=True)#
Extract linear weights of decoder
- Parameters:
set_column_names (
bool(default:True)) – Whether to set the column names to covariate names- Return type:
DataFrame- Returns:
Pandas DataFrame
n_genesxn_latent
- SCCORAL.get_marginal_ll(adata=None, indices=None, n_mc_samples=1000, batch_size=None, return_mean=True, **kwargs)#
Return the marginal LL for the data.
The computation here is a biased estimator of the marginal log likelihood of the data. Note, this is not the negative log likelihood, higher is better.
- Parameters:
adata (
Optional[AnnData] (default:None)) – AnnData object with equivalent structure to initial AnnData. IfNone, defaults to the AnnData object used to initialize the model.indices (
Optional[Sequence[int]] (default:None)) – Indices of cells in adata to use. IfNone, all cells are used.n_mc_samples (
int(default:1000)) – Number of Monte Carlo samples to use for marginal LL estimation.batch_size (
Optional[int] (default:None)) – Minibatch size for data loading into model. Defaults toscvi.settings.batch_size.return_mean (
Optional[bool] (default:True)) – If False, return the marginal log likelihood for each observation. Otherwise, return the mmean arginal log likelihood.
- Return type:
Union[Tensor,float]
- SCCORAL.get_reconstruction_error(adata=None, indices=None, batch_size=None)#
Return the reconstruction error for the data.
This is typically written as \(p(x \mid z)\), the likelihood term given one posterior sample. Note, this is not the negative likelihood, higher is better.
- Parameters:
adata (
Optional[AnnData] (default:None)) – AnnData object with equivalent structure to initial AnnData. IfNone, defaults to the AnnData object used to initialize the model.indices (
Optional[Sequence[int]] (default:None)) – Indices of cells in adata to use. IfNone, all cells are used.batch_size (
Optional[int] (default:None)) – Minibatch size for data loading into model. Defaults toscvi.settings.batch_size.
- Return type:
- classmethod SCCORAL.load(dir_path, adata=None, accelerator='auto', device='auto', prefix=None, backup_url=None)#
Instantiate a model from the saved output.
- Parameters:
dir_path (
str) – Path to saved outputs.adata (
Union[AnnData,MuData,None] (default:None)) – AnnData organized in the same way as data used to train model. It is not necessary to run setup_anndata, as AnnData is validated against the savedscvisetup dictionary. If None, will check for and load anndata saved with the model.accelerator (
str(default:'auto')) – Supports passing different accelerator types("cpu", "gpu", "tpu", "ipu", "hpu", "mps, "auto")as well as custom accelerator instances.device (
int|str(default:'auto')) – The device to use. Can be set to a non-negative index (intorstr) or"auto"for automatic selection based on the chosen accelerator. If set to"auto"andacceleratoris not determined to be"cpu", thendevicewill be set to the first available device.prefix (
str|None(default:None)) – Prefix of saved file names.backup_url (
str|None(default:None)) – URL to retrieve saved outputs from if not present on disk.
- Returns:
Model with loaded state dictionaries.
Examples
>>> model = ModelClass.load(save_path, adata) >>> model.get_....
- static SCCORAL.load_registry(dir_path, prefix=None)#
Return the full registry saved with the model.
- classmethod SCCORAL.register_manager(adata_manager)#
Registers an
AnnDataManagerinstance with this model class.Stores the
AnnDataManagerreference in a class-specific manager store. Intended for use in thesetup_anndata()class method followed up by retrieval of theAnnDataManagervia the_get_most_recent_anndata_manager()method in the model init method.Notes
Subsequent calls to this method with an
AnnDataManagerinstance referring to the same underlying AnnData object will overwrite the reference to previousAnnDataManager.
- SCCORAL.save(dir_path, prefix=None, overwrite=False, save_anndata=False, save_kwargs=None, **anndata_write_kwargs)#
Save the state of the model.
Neither the trainer optimizer state nor the trainer history are saved. Model files are not expected to be reproducibly saved and loaded across versions until we reach version 1.0.
- Parameters:
dir_path (
str) – Path to a directory.prefix (
str|None(default:None)) – Prefix to prepend to saved file names.overwrite (
bool(default:False)) – Overwrite existing data or not. IfFalseand directory already exists atdir_path, error will be raised.save_anndata (
bool(default:False)) – If True, also saves the anndatasave_kwargs (
dict|None(default:None)) – Keyword arguments passed intosave().anndata_write_kwargs – Kwargs for
write()
- classmethod SCCORAL.setup_anndata(adata, batch_key=None, categorical_covariates=None, continuous_covariates=None, layer=None, **kwargs)#
Sets up the
AnnDataobject for this model.A mapping will be created between data fields used by this model to their respective locations in adata. None of the data in adata are modified. Only adds fields to adata.
Each model class deriving from this class provides parameters to this method according to its needs. To operate correctly with the model initialization, the implementation must call
register_manager()on a model-specific instance ofAnnDataManager.
- SCCORAL.to_device(device)#
Move model to device.
- Parameters:
device (
str|int) – Device to move model to. Options: ‘cpu’ for CPU, integer GPU index (eg. 0), or ‘cuda:X’ where X is the GPU index (eg. ‘cuda:0’). See torch.device for more info.
Examples
>>> adata = scvi.data.synthetic_iid() >>> model = scvi.model.SCVI(adata) >>> model.to_device('cpu') # moves model to CPU >>> model.to_device('cuda:0') # moves model to GPU 0 >>> model.to_device(0) # also moves model to GPU 0
- SCCORAL.train(max_epochs=500, pretraining=True, use_gpu=True, accelerator='auto', devices='auto', validation_size=0.1, batch_size=128, early_stopping=True, pretraining_max_epochs=500, pretraining_early_stopping=True, pretraining_early_stopping_metric='reconstruction_loss_train', pretraining_min_delta=0.0, pretraining_early_stopping_patience=5, plan_kwargs=None, trainer_kwargs=None, **kwargs)#
Train sccoral model
Training is split into pretraining (only training on covariates, frozen z_encoder weights) and training (unfrozen weights). Same training procedure as for scVI/LSCVI except for pretraining.
- Parameters:
max_epochs (
int(default:500)) – Maximum epochs during trainingmax_pretraining_epochs – Maximum epochs during pretraining. If
None, same as max_epochsaccelerator (
Optional[Literal['cpu','gpu','auto']] (default:'auto')) – cpu/gpu/auto: auto automatically detects available devicesdevices (default:
'auto') – Ifauto, automatically detects available devicesvalidation_size (
None|float(default:0.1)) – Size of validation split (0-1). Rest is train splitbatch_size (
int(default:128)) – Size of minibatches during trainingearly_stopping (
Tunable_[bool] (default:True)) – Enable early stopping during trainingpretraining (
Tunable_[bool] (default:True)) – Whether to conduct pretrainingpretraining_max_epochs (
Tunable_[int] (default:500)) – Maximum number of epochs for pretraining to continue.pretraining_early_stopping (
Tunable_[bool] (default:True)) – Enable early stopping during pretrainingplan_kwargs (
None|dict[str,Any] (default:None)) – Training keyword arguments passed tosccoral.train.TrainingPlantrainer_kwargs (
None|dict[str,Any] (default:None)) – Additional keyword arguments passed toscvi.train.TrainRunnerkwargs – Not passed.
- Return type:
- Returns:
Training runner (scvi-tools wrapper of pytorch lightning trainer.)
- SCCORAL.view_anndata_setup(adata=None, hide_state_registries=False)#
Print summary of the setup for the initial AnnData or a given AnnData object.