mridc.collections.common.data package

Submodules

mridc.collections.common.data.dataset module

class mridc.collections.common.data.dataset.ConcatDataset(datasets: List[Any], shuffle: bool = True, sampling_technique: str = 'random', sampling_probabilities: Optional[List[float]] = None, global_rank: int = 0, world_size: int = 1)[source]

Bases: IterableDataset, ABC

A dataset that accepts as argument multiple datasets and then samples from them based on the specified sampling technique.

Parameters
  • datasets (A list of datasets to sample from.) –

  • shuffle (Whether to shuffle individual datasets. Only works with non-iterable datasets. Defaults to True.) –

  • sampling_technique (Sampling technique to choose which dataset to draw a sample from. Defaults to 'random'.) –

  • 'round-robin'. (Currently supports 'random' and) –

  • sampling_probabilities (Probability values for sampling. Only used when sampling_technique = 'random'.) –

  • global_rank (Worker rank, used for partitioning map style datasets. Defaults to 0.) –

  • world_size (Total number of processes, used for partitioning map style datasets. Defaults to 1.) –

__iter__()[source]

Returns an iterator over the dataset.

__len__()[source]

Returns the number of elements in the dataset.

get_iterable(dataset)[source]

Returns an iterable dataset.

static random_generator(datasets, **kwargs)[source]

Generates random indices.

static round_robin_generator(datasets, **kwargs)[source]

Generates indices in a round-robin fashion.

class mridc.collections.common.data.dataset.ConcatMapDataset(datasets: List[Any], sampling_technique: str = 'temperature', sampling_temperature: int = 5, sampling_probabilities: Optional[List[float]] = None, consumed_samples: int = 0)[source]

Bases: Dataset

A dataset that accepts as argument multiple datasets and then samples from them based on the specified sampling technique.

Parameters
  • datasets (A list of datasets to sample from.) –

  • shuffle (Whether to shuffle individual datasets. Only works with non-iterable datasets. Defaults to True.) –

  • sampling_technique (Sampling technique to choose which dataset to draw a sample from. Defaults to 'random'.) – Currently supports ‘random’ and ‘round-robin’.

  • sampling_probabilities (Probability values for sampling. Only used when sampling_technique = 'random'.) –

  • global_rank (Worker rank, used for partitioning map style datasets. Defaults to 0.) –

  • world_size (Total number of processes, used for partitioning map style datasets. Defaults to 1.) –

Module contents