Generating data set splits¶
To use TorchFSDD to create torch.utils.data.Dataset
data set objects for FSDD,
you first need to create a data set generator using the torchfsdd.TorchFSDDGenerator
class.
This class first downloads the data set from the GitHub repository (if not already downloaded), then allows you to generate data splits (full, train/test, or train/validation/test) by automatically selecting which files belong to which partition.
Each data set split is represented by a torchfsdd.TorchFSDD
object, which is a wrapper for torch.utils.data.Dataset
.
For each split, the data set generator initializes one of these data sets and passes on the files that compose that split,
along with any transformations that should be applied to the recordings.
- class torchfsdd.TorchFSDDGenerator(version='master', path=None, transforms=None, load_all=False, **args)[source]¶
A
torch.utils.data.Dataset
generator for splits of the Free Spoken Digit Dataset.- Parameters:
- version: str
The version of FSDD to download from the GitHub repository, specified as a branch name (defaults to ‘master’) or Git version tag, e.g. ‘v1.0.6’.
Alternatively, if you already have a local copy of the dataset that you would like to use, you can set this argument to ‘local’ and provide a path to the folder containing the WAV files, as the
path
argument.- path: str, optional
If
version
is a Git branch name or version tag, then this is the path where the Git repository will be cloned to (a new folder will be created at the specified path). If none is specified, thenos.getcwd()
is used.If
version
is set to ‘local’, then this is the path to the folder containing the WAV audio recordings.- transforms: callable, optional
A callable transformation to apply to a 1D
torch.Tensor
of audio samples.See also
- load_all: bool
Whether or not to load the entire dataset into memory.
See also
- **args: optional
Arbitrary keyword arguments passed on to
torchaudio.load()
.
- full()[source]¶
Generates a data set wrapper for the entire data set.
- Returns:
- full_set:
TorchFSDD
The
torch.utils.data.Dataset
wrapper for the full data set.
- full_set:
- train_test_split(test_size=0.1)[source]¶
Generates training and test data set wrappers.
- Parameters:
- test_size: 0 < float < 1
Size of the test data set (as a proportion).
- Returns:
- train_set:
TorchFSDD
The training set
torch.utils.data.Dataset
wrapper.- test_set:
TorchFSDD
The test set
torch.utils.data.Dataset
wrapper.
- train_set:
- train_val_test_split(test_size=0.1, val_size=0.1)[source]¶
Generates training, validation and test data set wrappers.
- Parameters:
- test_size: 0 < float < 1
Size of the test data set (as a proportion).
- val_size: 0 < float < 1
Size of the validation data set (as a proportion).
- Returns:
- train_set:
TorchFSDD
The training set
torch.utils.data.Dataset
wrapper.- val_set:
TorchFSDD
The validation set
torch.utils.data.Dataset
wrapper.- test_set:
TorchFSDD
The test set
torch.utils.data.Dataset
wrapper.
- train_set:
- class torchfsdd.TorchFSDD(files, transforms=None, load_all=False, **args)[source]¶
A
torch.utils.data.Dataset
wrapper for specified WAV audio recordings of the Free Spoken Digit Dataset.Tip
There should rarely be a situation where you have to initialize this class manually, unless you are experimenting with specific subsets of the FSDD. You should use
TorchFSDDGenerator
to either load the full data set or generate splits for training/validation/testing.- Parameters:
- files: list of str
List of file paths to the WAV audio recordings for the dataset.
- transforms: callable, optional
A callable transformation to apply to a 1D
torch.Tensor
of audio samples.This can be a single transformation, such as the
TrimSilence
transformation included in this package.from torchfsdd import TorchFSDDGenerator, TrimSilence fsdd = TorchFSDDGenerator(transforms=TrimSilence(threshold=150))
It could also be a series of transformations composed together with
torchvision.transforms.Compose
.from torchfsdd import TorchFSDDGenerator, TrimSilence from torchaudio.transforms import MFCC from torchvision.transforms import Compose fsdd = TorchFSDDGenerator(transforms=Compose([ TrimSilence(threshold=100), MFCC(sample_rate=8e3, n_mfcc=13) ]))
There are many useful audio transformations in
torchaudio.transforms
such astorchaudio.transforms.MFCC
.- load_all: bool
Whether or not to load the entire dataset into memory.
This essentially defeats the point of batching, but the dataset is relatively small enough that it can comfortably fit into memory and possibly provide some speed-up.
If this is set to True, then the complete set of raw audio recordings and labels (for the specified files) can be accessed with
self.recordings
andself.labels
.- **args: optional
Arbitrary keyword arguments passed on to
torchaudio.load()
.
Transformations¶
While many transformations can be applied to audio data, this package only includes a transformation
for trimming silence from the start or end of each audio recording. The implementation of this transformation
is exactly the same as the trimming utility on the FSDD repository,
but for PyTorch tensors, and assuming a normalized signal (since torchaudio.load()
automatically normalizes).