mridc.collections.common.data package

Submodules

mridc.collections.common.data.dataset module

class mridc.collections.common.data.dataset.ConcatDataset(datasets: List[Any], shuffle: bool = True, sampling_technique: str = 'random', sampling_probabilities: Optional[List[float]] = None, global_rank: int = 0, world_size: int = 1)[source]

Bases: IterableDataset, ABC

A dataset that accepts as argument multiple datasets and then samples from them based on the specified sampling technique.

Parameters

datasets (A list of datasets to sample from.) –
shuffle (Whether to shuffle individual datasets. Only works with non-iterable datasets. Defaults to True.) –
sampling_technique (Sampling technique to choose which dataset to draw a sample from. Defaults to 'random'.) –
'round-robin'. (Currently supports 'random' and) –
sampling_probabilities (Probability values for sampling. Only used when sampling_technique = 'random'.) –
global_rank (Worker rank, used for partitioning map style datasets. Defaults to 0.) –
world_size (Total number of processes, used for partitioning map style datasets. Defaults to 1.) –

__iter__()[source]: Returns an iterator over the dataset.

__len__()[source]: Returns the number of elements in the dataset.

get_iterable(dataset)[source]: Returns an iterable dataset.

static random_generator(datasets, **kwargs)[source]: Generates random indices.

static round_robin_generator(datasets, **kwargs)[source]: Generates indices in a round-robin fashion.

class mridc.collections.common.data.dataset.ConcatMapDataset(datasets: List[Any], sampling_technique: str = 'temperature', sampling_temperature: int = 5, sampling_probabilities: Optional[List[float]] = None, consumed_samples: int = 0)[source]

Bases: Dataset

A dataset that accepts as argument multiple datasets and then samples from them based on the specified sampling technique.

Parameters

datasets (A list of datasets to sample from.) –
shuffle (Whether to shuffle individual datasets. Only works with non-iterable datasets. Defaults to True.) –
sampling_technique (Sampling technique to choose which dataset to draw a sample from. Defaults to 'random'.) – Currently supports ‘random’ and ‘round-robin’.
sampling_probabilities (Probability values for sampling. Only used when sampling_technique = 'random'.) –
global_rank (Worker rank, used for partitioning map style datasets. Defaults to 0.) –
world_size (Total number of processes, used for partitioning map style datasets. Defaults to 1.) –

mridc.collections.common.data package

Submodules

mridc.collections.common.data.dataset module

Module contents