mridc.collections.common.data package
Submodules
mridc.collections.common.data.dataset module
- class mridc.collections.common.data.dataset.ConcatDataset(datasets: List[Any], shuffle: bool = True, sampling_technique: str = 'random', sampling_probabilities: Optional[List[float]] = None, global_rank: int = 0, world_size: int = 1)[source]
Bases:
IterableDataset
,ABC
A dataset that accepts as argument multiple datasets and then samples from them based on the specified sampling technique.
- Parameters
datasets (A list of datasets to sample from.) –
shuffle (Whether to shuffle individual datasets. Only works with non-iterable datasets. Defaults to True.) –
sampling_technique (Sampling technique to choose which dataset to draw a sample from. Defaults to 'random'.) –
'round-robin'. (Currently supports 'random' and) –
sampling_probabilities (Probability values for sampling. Only used when sampling_technique = 'random'.) –
global_rank (Worker rank, used for partitioning map style datasets. Defaults to 0.) –
world_size (Total number of processes, used for partitioning map style datasets. Defaults to 1.) –
- class mridc.collections.common.data.dataset.ConcatMapDataset(datasets: List[Any], sampling_technique: str = 'temperature', sampling_temperature: int = 5, sampling_probabilities: Optional[List[float]] = None, consumed_samples: int = 0)[source]
Bases:
Dataset
A dataset that accepts as argument multiple datasets and then samples from them based on the specified sampling technique.
- Parameters
datasets (A list of datasets to sample from.) –
shuffle (Whether to shuffle individual datasets. Only works with non-iterable datasets. Defaults to True.) –
sampling_technique (Sampling technique to choose which dataset to draw a sample from. Defaults to 'random'.) – Currently supports ‘random’ and ‘round-robin’.
sampling_probabilities (Probability values for sampling. Only used when sampling_technique = 'random'.) –
global_rank (Worker rank, used for partitioning map style datasets. Defaults to 0.) –
world_size (Total number of processes, used for partitioning map style datasets. Defaults to 1.) –