mridc.core.classes package

Submodules

mridc.core.classes.common module

class mridc.core.classes.common.FileIO[source]

Bases: ABC

Base class for file IO.

classmethod from_config_file(path2yaml_file: str)[source]

Instantiates an instance of mridc Model from YAML config file. Weights will be initialized randomly.

Parameters: path2yaml_file (path to yaml file with model configuration) –
Return type: Model instance.

classmethod restore_from(restore_path: str, override_config_path: Optional[str] = None, map_location: Optional[device] = None, strict: bool = True, return_config: bool = False, trainer: Optional[Trainer] = None, save_restore_connector: Optional[SaveRestoreConnector] = None)[source]

Restores model instance (weights and configuration) from a .mridc file.

Parameters

restore_path (Path to .mridc file from which model should be instantiated.) – str
override_config_path (Path to .yaml file containing the configuration to override the one in the .mridc file.) – str
map_location (Device to map the instantiated model to. By default (None), it will select a GPU if available, falling back to CPU otherwise.) – torch.device
strict (Passed to load_state_dict. By default True.) – bool
return_config (If True, returns the underlying config of the restored model as an OmegaConf DictConfig object without instantiating the model.) – bool
trainer (If provided, will be used to instantiate the model.) – Trainer
save_restore_connector (An optional SaveRestoreConnector object that defines the implementation of the restore_from() method.) – SaveRestoreConnector

save_to(save_path: str)[source]

Standardized method to save a tarfile containing the checkpoint, config, and any additional artifacts. Implemented via mridc.core.connectors.save_restore_connector.SaveRestoreConnector.save_to().

Parameters: save_path (Path to save the checkpoint to.) – str

to_config_file(path2yaml_file: str)[source]

Saves current instance’s configuration to YAML config file. Weights will not be saved.

Parameters: path2yaml_file (path2yaml_file: path to yaml file where model configuration will be saved.) –

class mridc.core.classes.common.Model[source]

Bases: Typing, Serialization, FileIO, ABC

Abstract class offering interface which should be implemented by all mridc models.

classmethod from_pretrained(model_name: str, refresh_cache: bool = False, override_config_path: Optional[str] = None, map_location: Optional[device] = None, strict: bool = True, return_config: bool = False, trainer: Optional[Trainer] = None, save_restore_connector: Optional[SaveRestoreConnector] = None)[source]

Instantiates an instance of mridc. Use restore_from() to instantiate from a local .mridc file.

Parameters

model_name (String key which will be used to find the module.) –
refresh_cache (If set to True, then when fetching from cloud, this will re-fetch the file from cloud even if it) – is already found in a cache locally.
override_config_path (Path to a yaml config that will override the internal config file.) –
map_location (Optional torch.device() to map the instantiated model to a device. By default (None), it will) –
available (select a GPU if) –
otherwise. (falling back to CPU) –
strict (Passed to torch.load_state_dict. By default, True.) –
return_config (If set to true, will return just the underlying config of the restored model as an) –
model. (OmegaConf/DictConfig object without instantiating the) –
trainer (Optional Trainer objects to use for restoring the model.) –
save_restore_connector (Optional SaveRestoreConnector object to use for restoring the model.) –

Return type

A model instance of a particular model class or its underlying config (if return_config is set).

classmethod get_available_model_names() → List[str][source]

Returns the list of model names available. To get the complete model description use list_available_models().

Return type: A list of model names.

classmethod list_available_models() → Optional[PretrainedModelInfo][source]

Should list all pre-trained models available. Note: There is no check that requires model names and aliases to be unique. In the case of a collision, whatever model (or alias) is listed first in the returned list will be instantiated.

Return type: A list of PretrainedModelInfo entries.

class mridc.core.classes.common.PretrainedModelInfo(pretrained_model_name: str, description: str, location: str, class_: Optional[Model] = None, aliases: Optional[List[str]] = None)[source]

Bases: object

Class to store information about a pretrained model.

aliases: Optional[List[str]] = None

class_: Optional[Model] = None

description: str

location: str

pretrained_model_name: str

class mridc.core.classes.common.Serialization[source]

Bases: ABC

Base class for serialization.

classmethod from_config_dict(config: DictConfig, trainer: Optional[Trainer] = None)[source]: Instantiates object using DictConfig-based configuration

to_config_dict() → DictConfig[source]: Returns object’s configuration to config dictionary

class mridc.core.classes.common.Typing[source]

Bases: ABC

An interface which endows module with neural types

property input_types: Optional[Dict[str, NeuralType]]: Define these to enable input neural type checks

property output_types: Optional[Dict[str, NeuralType]]: Define these to enable output neural type checks

mridc.core.classes.common.is_typecheck_enabled()[source]: Getter method for typechecking state.

class mridc.core.classes.common.typecheck(input_types: Optional[Union[TypeState, Dict[str, NeuralType]]] = TypeState.UNINITIALIZED, output_types: Optional[Union[TypeState, Dict[str, NeuralType]]] = TypeState.UNINITIALIZED, ignore_collections: bool = False)[source]

Bases: object

A decorator which performs input-output neural type checks, and attaches neural types to the output of the function that it wraps. Requires that the class inherit from mridc.core.Typing in order to perform type checking, and will raise an error if that is not the case.

# Usage (Class level type support) .. code-block:: python

@typecheck() def fn(self, arg1, arg2, …):

# Usage (Function level type support) .. code-block:: python

@typecheck(input_types=…, output_types=…) def fn(self, arg1, arg2, …):

Points to be noted:

The brackets () in @typecheck() are necessary. You will encounter a TypeError: __init__() takes 1 positional argument but X were given without those brackets.
The function can take any number of positional arguments during definition. When you call this function, all arguments must be passed using kwargs only.

class TypeState(value)[source]

Bases: Enum

Placeholder to denote the default value of type information provided. If the constructor of this decorator is used to override the class level type definition, this enum value indicate that types will be overridden.

UNINITIALIZED = 0

__call__(enabled=None, adapter=None, proxy=<class 'FunctionWrapper'>)[source]: Wrapper method that can be used on any function of a class that implements Typing. By default, it will utilize the input_types and output_types properties of the class inheriting Typing. Local function level overrides can be provided by supplying dictionaries as arguments to the decorator.

static disable_checks()[source]: Temporarily disable type checks.

static set_typecheck_enabled(enabled: bool = True)[source]: Set the global typecheck flag.

mridc.core.classes.dataset module

class mridc.core.classes.dataset.Dataset[source]

Bases: Dataset, Typing, Serialization, ABC

Dataset with output ports. Please Note: Subclasses of IterableDataset should not implement input_types.

collate_fn(batch)[source]

This is the method that user pass as functor to DataLoader. The method optionally performs neural type checking and add types to the outputs.

Please note, subclasses of Dataset should not implement input_types.

# Usage:

dataloader = torch.utils.data.DataLoader(
        ....,
        collate_fn=dataset.collate_fn,
        ....
)

Return type: Collated batch, with or without types.

class mridc.core.classes.dataset.DatasetConfig(batch_size: int = 32, drop_last: bool = False, shuffle: bool = False, num_workers: Optional[int] = 0, pin_memory: bool = True)[source]

Bases: object

Dataset configuration.

batch_size: int = 32

drop_last: bool = False

num_workers: Optional[int] = 0

pin_memory: bool = True

shuffle: bool = False

class mridc.core.classes.dataset.IterableDataset[source]

Bases: IterableDataset, Typing, Serialization, ABC

Iterable Dataset with output ports. Please Note: Subclasses of IterableDataset should not implement input_types.

collate_fn(batch)[source]

This is the method that user pass as functor to DataLoader. The method optionally performs neural type checking and add types to the outputs.

# Usage:

dataloader = torch.utils.data.DataLoader(
        ....,
        collate_fn=dataset.collate_fn,
        ....
)

Return type: Collated batch, with or without types.

mridc.core.classes.export module

class mridc.core.classes.export.ExportFormat(value)[source]

Bases: Enum

Which format to use when exporting a Neural Module for deployment

ONNX = (1,)

TORCHSCRIPT = (2,)

class mridc.core.classes.export.Exportable[source]

Bases: ABC

This Interface should be implemented by particular classes derived from mridc.core.NeuralModule or mridc.core.ModelPT. It gives these entities ability to be exported for deployment to formats such as ONNX.

property disabled_deployment_input_names: Implement this method to return a set of input names disabled for export

property disabled_deployment_output_names: Implement this method to return a set of output names disabled for export

export(output: str, input_example=None, verbose=False, do_constant_folding=True, onnx_opset_version=None, training=<TrainingMode.EVAL: 0>, check_trace: ~typing.Union[bool, ~typing.List[~torch.Tensor]] = False, dynamic_axes=None, check_tolerance=0.01, export_modules_as_functions: bool = False)[source]

Export the module to a file.

Parameters

output (The output file path.) –
input_example (A dictionary of input names and values.) –
verbose (If True, print out the export process.) –
do_constant_folding (If True, do constant folding.) –
onnx_opset_version (The ONNX opset version to use.) –
training (Training mode for the export.) –
check_trace (If True, check the trace of the exported model.) –
dynamic_axes (A dictionary of input names and dynamic axes.) –
check_tolerance (The tolerance for the check_trace.) –
export_modules_as_functions (If True, export modules as functions.) –

get_export_subnet(subnet=None)[source]: Returns Exportable subnet model/module to export

property input_module

property input_names: Implement this method to return a list of input names

list_export_subnets()[source]: Returns default set of subnet names exported for this model. First goes the one receiving input (input_example).

property output_module

property output_names: Override this method to return a set of output names disabled for export

property supported_export_formats: Implement this method to return a set of export formats supported. Default is all types.

mridc.core.classes.loss module

class mridc.core.classes.loss.Loss(size_average=None, reduce=None, reduction: str = 'mean')[source]

Bases: _Loss, Typing, Serialization

Inherit this class to implement custom loss.

reduction: str

mridc.core.classes.modelPT module

class mridc.core.classes.modelPT.ModelPT(cfg: DictConfig, trainer: Optional[Trainer] = None)[source]

Bases: LightningModule, Model

Interface for Pytorch-lightning based mridc models

classmethod __init_subclass__() → None[source]: This method is called when a subclass is created.

property cfg: Property that holds the finalized internal config of the model.

Note

Changes to this config are not reflected in the state of the model. Please create a new model using an updated config to properly update the model.

configure_optimizers()[source]: Configure optimizers and schedulers for training.

classmethod extract_state_dict_from(restore_path: str, save_dir: str, split_by_module: bool = False, save_restore_connector: Optional[SaveRestoreConnector] = None)[source]

Extract the state dict(s) from a provided .mridc tarfile and save it to a directory.

Parameters

restore_path (path to .mridc file from which state dict(s) should be extracted) –
save_dir (directory in which the saved state dict(s) should be stored) –
split_by_module (bool flag, which determines whether the output checkpoint should be for the entire Model, or) –
Model (the individual module's that comprise the) –
save_restore_connector (Can be overridden to add custom save and restore logic.) –

Example

To convert the .mridc tarfile into a single Model level PyTorch checkpoint

state_dict = mridc.collections.asr.models.EncDecCTCModel.extract_state_dict_from('asr.mridc',             './asr_ckpts')

To restore a model from a Model level checkpoint

model = mridc.collections.asr.models.EncDecCTCModel(cfg)  # or any other method of restoration
model.load_state_dict(torch.load("./asr_ckpts/model_weights.ckpt"))

To convert the .mridc tarfile into multiple Module level PyTorch checkpoints

state_dict = mridc.collections.asr.models.EncDecCTCModel.extract_state_dict_from('asr.mridc',             './asr_ckpts', split_by_module=True)

To restore a module from a Module level checkpoint

model = mridc.collections.asr.models.EncDecCTCModel(cfg)  # or any other method of restoration
# load the individual components
model.preprocessor.load_state_dict(torch.load("./asr_ckpts/preprocessor.ckpt"))
model.encoder.load_state_dict(torch.load("./asr_ckpts/encoder.ckpt"))
model.decoder.load_state_dict(torch.load("./asr_ckpts/decoder.ckpt"))

Return type: The state dict that was loaded from the original .mridc checkpoint.

get_test_dataloader_prefix(dataloader_idx: int = 0) → str[source]: Get the name of one or more data loaders, which will be prepended to all logs.

get_validation_dataloader_prefix(dataloader_idx: int = 0) → str[source]: Get the name of one or more data loaders, which will be prepended to all logs.

classmethod load_from_checkpoint(checkpoint_path: str, *args, map_location: Optional[Union[Dict[str, str], str, device, int, Callable]] = None, hparams_file: Optional[str] = None, strict: bool = True, **kwargs)[source]: Loads ModelPT from checkpoint, with some maintenance of restoration. For documentation, please refer to LightningModule.load_from_checkpoint() documentation.

load_part_of_state_dict(state_dict, include, exclude, load_from_string)[source]: Load part of the state dict.

maybe_init_from_pretrained_checkpoint(cfg: OmegaConf, map_location: str = 'cpu')[source]

Initializes a given model with the parameters obtained via specific config arguments. The state dict of the provided model will be updated with strict=False setting to prevent requirement of exact model parameters matching.

Initializations

init_from_mridc_model: Str path to a .mridc model, which will be instantiated in order to extract the state dict.

init_from_pretrained_model: Str name of a pretrained model checkpoint (obtained via cloud). The model will be downloaded (or a cached copy will be used), instantiated and then its state dict will be extracted.

init_from_ptl_ckpt: Str name of a Pytorch Lightning checkpoint file. It will be loaded and the state dict will extract.

Parameters

cfg (The config used to instantiate the model. It needs only contain one of the above keys.) –
map_location (str or torch.device() which represents where the intermediate state dict (from the pretrained model or checkpoint) will be loaded.) –

static multi_test_epoch_end(outputs: Union[object, List[Dict[str, Tensor]]], dataloader_idx: int = 0) → None[source]

Adds support for multiple test datasets. Should be overridden by subclass, to obtain appropriate logs for each of the dataloaders.

Parameters

outputs (Same as that provided by LightningModule.validation_epoch_end() for a single dataloader.) –
dataloader_idx (int representing the index of the dataloader.) –

Returns

A dictionary of values, optionally containing a sub-dict log, such that the values in the log will be
pre-pended by the dataloader prefix.

static multi_validation_epoch_end(outputs: Optional[Union[object, List[Dict[str, Tensor]]]], dataloader_idx: int = 0) → None[source]

Adds support for multiple validation datasets. Should be overridden by subclass, to obtain appropriate logs for: each of the dataloaders.

Parameters

outputs (Same as that provided by LightningModule.validation_epoch_end() for a single dataloader.) –
dataloader_idx (int representing the index of the dataloader.) –

Returns

A dictionary of values, optionally containing a sub-dict log, such that the values in the log will be
pre-pended by the dataloader prefix.

property num_weights: Utility property that returns the total number of parameters of the Model.

prepare_test(trainer: Trainer) → bool[source]

Helper method to check whether the model can safely be tested on a dataset after training (or loading a checkpoint).

trainer = Trainer()
if model.prepare_test(trainer):
    trainer.test(model)

Return type: Bool which declares the model safe to test. Provides warnings if it has to return False to guide the user.

register_artifact(config_path: str, src: str, verify_src_exists: bool = True)[source]

Register model artifacts with this function. These artifacts (files) will be included inside .mridc file when model.save_to(“model.mridc”) is called.

How it works:

It always returns existing absolute path which can be used during Model constructor call EXCEPTION: src is None or “” in which case nothing will be done and src will be returned
It will add (config_path, model_utils.ArtifactItem()) pair to self.artifacts

If “src” is local existing path, then it will be returned in absolute path form. elif “src” starts with “mridc_file:unique_artifact_name” .mridc will be untarred to a temporary folder location and an actual existing path will be returned else an error will be raised.

WARNING: use .register_artifact calls in your models’ constructors. The returned path is not guaranteed to exist after you have exited your model’s constructor.

Parameters

config_path (Artifact key. Usually corresponds to the model config.) –
src (Path to artifact.) –
verify_src_exists (If set to False, then the artifact is optional and register_artifact will return None even if src is not found. Defaults to True.) –

Return type

If src is not None or empty it always returns absolute path which is guaranteed to exist during model instance life.

classmethod restore_from(restore_path: str, override_config_path: Optional[Union[OmegaConf, str]] = None, map_location: Optional[device] = None, strict: bool = True, return_config: bool = False, save_restore_connector: Optional[SaveRestoreConnector] = None, trainer: Optional[Trainer] = None)[source]

Restores model instance (weights and configuration) from .mridc file.

Parameters

restore_path (path to .mridc file from which model should be instantiated override_config_path: path to a yaml config that will override the internal config file or an OmegaConf/DictConfig object representing the model config.) –
map_location (Optional torch.device() to map the instantiated model to a device. By default (None), it will select a GPU if available, falling back to CPU otherwise.) –
strict (Passed to load_state_dict. By default, True.) –
return_config (If set to true, will return just the underlying config of the restored model as an OmegaConf/DictConfig object without instantiating the model.) –
trainer (Optional, a pytorch lightning Trainer object that will be forwarded to the instantiated model's constructor.) –
save_restore_connector (Can be overridden to add custom save and restore logic.) –

Example

model = mridc.collections.asr.models.EncDecCTCModel.restore_from('asr.mridc')
assert isinstance(model, mridc.collections.asr.models.EncDecCTCModel)

Return type: An instance of type cls or its underlying config (if return_config is set).

save_to(save_path: str)[source]

Saves model instance (weights and configuration) into .mridc file. You can use “restore_from” method to fully restore instance from .mridc file. .mridc file is an archive (tar.gz) with the following: - model_config.yaml - model configuration in .yaml format. You can deserialize this into cfg argument for model’s constructor - model_wights.ckpt - model checkpoint

Parameters: saved. (Path to .mridc file where model instance should be) –

set_trainer(trainer: Trainer)[source]: Set an instance of Trainer object.

set_world_size(trainer: Trainer)[source]: Determines the world size from the PyTorch Lightning Trainer and then updates AppState.

setup_multiple_test_data(test_data_config: Union[DictConfig, Dict])[source]: (Optionally) Setups data loader to be used in test, with support for multiple data loaders.

setup_multiple_validation_data(val_data_config: Union[DictConfig, Dict])[source]: (Optionally) Setups data loader to be used in validation.

setup_optimization(optim_config: Optional[Union[DictConfig, Dict]] = None)[source]

Prepares an optimizer from a string name and its optional config parameters.

Parameters

optim_config (A dictionary containing the following keys:) –

lr: mandatory key for learning rate. Will raise ValueError if not provided.
optimizer: string name pointing to one of the available optimizers in the registry. If not provided, defaults to “adam”.
opt_args: Optional list of strings, in the format “arg_name=arg_value”. The list of “arg_value” will be parsed and a dictionary of optimizer kwargs will be built and supplied to instantiate the optimizer.

Return type

An instance of an optimizer.

setup_optimizer_param_groups()[source]

Used to create param groups for the optimizer. As an example, this can be used to specify per-layer learning rates:

optim.SGD([
            {'params': model.base.parameters()},
            {'params': model.classifier.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

See https://pytorch.org/docs/stable/optim.html for more information. By default, ModelPT will use self.parameters(). Override this method to add custom param groups.

setup_test_data(test_data_config: Union[DictConfig, Dict])[source]: (Optionally) Setups data loader to be used in test.

abstract setup_training_data(train_data_config: Union[DictConfig, Dict])[source]: Setups data loader to be used in training.

abstract setup_validation_data(val_data_config: Union[DictConfig, Dict])[source]: Setups data loader to be used in validation.

summarize(max_depth: int = 1) → ModelSummary[source]: Summarize this LightningModule.

teardown(stage: str)[source]: Called at the end of fit and test.

test_dataloader()[source]: Return the test dataloader.

test_epoch_end(outputs: Union[List[Dict[str, Tensor]], List[List[Dict[str, Tensor]]]]) → Optional[Dict[str, Dict[str, Tensor]]][source]

Default DataLoader for Test set which automatically supports multiple data loaders via multi_test_epoch_end. If multi dataset support is not required, override this method entirely in base class. In such a case, there is no need to implement multi_test_epoch_end either.

Note

If more than one data loader exists, and they all provide test_loss, only the test_loss of the first data loader will be used by default. This default can be changed by passing the special key _test_dl_idx: int inside the test_ds config.

Parameters

outputs (Single or nested list of tensor outputs from one or more data loaders.) –

Returns

A dictionary containing the union of all items from individual data_loaders, along with merged logs from all
data loaders.

train_dataloader()[source]: Return the training dataloader.

training: bool

classmethod update_save_restore_connector(save_restore_connector)[source]: Update the save_restore_connector of the model.

val_dataloader()[source]: Return the validation dataloader.

validation_epoch_end(outputs: Union[List[Dict[str, Tensor]], List[List[Dict[str, Tensor]]]]) → Optional[Dict[str, Dict[str, Tensor]]][source]

Default DataLoader for Validation set which automatically supports multiple data loaders via multi_validation_epoch_end. If multi dataset support is not required, override this method entirely in base class. In such a case, there is no need to implement multi_validation_epoch_end either.

Note

If more than one data loader exists, and they all provide val_loss, only the val_loss of the first data loader will be used by default. This default can be changed by passing the special key val_dl_idx: int inside the validation_ds config.

Parameters

outputs (Single or nested list of tensor outputs from one or more data loaders.) –

Returns

A dictionary containing the union of all items from individual data_loaders, along with merged logs from all
data loaders.

mridc.core.classes.module module

class mridc.core.classes.module.NeuralModule[source]

Bases: Module, Typing, Serialization, FileIO, ABC

Abstract class offering interface shared between all PyTorch Neural Modules.

as_frozen()[source]: Context manager which temporarily freezes a module, yields control and finally unfreezes the module.

freeze() → None[source]: Freeze all params for inference.

static input_example(max_batch=None, max_dim=None)[source]

Override this method if random inputs won’t work

Parameters

max_batch (Maximum batch size to generate) –
max_dim (Maximum dimension to generate) –

Return type

A tuple sample of valid input data.

property num_weights: Utility property that returns the total number of parameters of NeuralModule.

training: bool

unfreeze() → None[source]: Unfreeze all parameters for training.

mridc.core.classes package

Submodules

mridc.core.classes.common module

mridc.core.classes.dataset module

mridc.core.classes.export module

mridc.core.classes.loss module

mridc.core.classes.modelPT module

mridc.core.classes.module module

Module contents