Generating data set splits¶

To use TorchFSDD to create torch.utils.data.Dataset data set objects for FSDD, you first need to create a data set generator using the torchfsdd.TorchFSDDGenerator class.

This class first downloads the data set from the GitHub repository (if not already downloaded), then allows you to generate data splits (full, train/test, or train/validation/test) by automatically selecting which files belong to which partition.

Each data set split is represented by a torchfsdd.TorchFSDD object, which is a wrapper for torch.utils.data.Dataset. For each split, the data set generator initializes one of these data sets and passes on the files that compose that split, along with any transformations that should be applied to the recordings.

class torchfsdd.TorchFSDDGenerator(version='master', path=None, transforms=None, load_all=False, **args)[source]¶

A torch.utils.data.Dataset generator for splits of the Free Spoken Digit Dataset.

Parameters:

version: str

The version of FSDD to download from the GitHub repository, specified as a branch name (defaults to ‘master’) or Git version tag, e.g. ‘v1.0.6’.

Alternatively, if you already have a local copy of the dataset that you would like to use, you can set this argument to ‘local’ and provide a path to the folder containing the WAV files, as the path argument.

path: str, optional

If version is a Git branch name or version tag, then this is the path where the Git repository will be cloned to (a new folder will be created at the specified path). If none is specified, then os.getcwd() is used.

If version is set to ‘local’, then this is the path to the folder containing the WAV audio recordings.

transforms: callable, optional

A callable transformation to apply to a 1D torch.Tensor of audio samples.

See also

TorchFSDD

**args: optional

Arbitrary keyword arguments passed on to torchaudio.load().

full()[source]¶

Generates a data set wrapper for the entire data set.

Returns:

full_set: TorchFSDD: The torch.utils.data.Dataset wrapper for the full data set.

train_test_split(test_size=0.1)[source]¶

Generates training and test data set wrappers.

Parameters:

test_size: 0 < float < 1: Size of the test data set (as a proportion).

Returns:

train_set: TorchFSDD: The training set torch.utils.data.Dataset wrapper.
test_set: TorchFSDD: The test set torch.utils.data.Dataset wrapper.

train_val_test_split(test_size=0.1, val_size=0.1)[source]¶

Generates training, validation and test data set wrappers.

Parameters:

test_size: 0 < float < 1: Size of the test data set (as a proportion).
val_size: 0 < float < 1: Size of the validation data set (as a proportion).

Returns:

train_set: TorchFSDD: The training set torch.utils.data.Dataset wrapper.
val_set: TorchFSDD: The validation set torch.utils.data.Dataset wrapper.
test_set: TorchFSDD: The test set torch.utils.data.Dataset wrapper.

class torchfsdd.TorchFSDD(files, transforms=None, load_all=False, **args)[source]¶

A torch.utils.data.Dataset wrapper for specified WAV audio recordings of the Free Spoken Digit Dataset.

Tip

There should rarely be a situation where you have to initialize this class manually, unless you are experimenting with specific subsets of the FSDD. You should use TorchFSDDGenerator to either load the full data set or generate splits for training/validation/testing.

Parameters:

files: list of str

List of file paths to the WAV audio recordings for the dataset.

transforms: callable, optional

A callable transformation to apply to a 1D torch.Tensor of audio samples.

This can be a single transformation, such as the TrimSilence transformation included in this package.

from torchfsdd import TorchFSDDGenerator, TrimSilence

fsdd = TorchFSDDGenerator(transforms=TrimSilence(threshold=150))

It could also be a series of transformations composed together with torchvision.transforms.Compose.

from torchfsdd import TorchFSDDGenerator, TrimSilence
from torchaudio.transforms import MFCC
from torchvision.transforms import Compose

fsdd = TorchFSDDGenerator(transforms=Compose([
    TrimSilence(threshold=100),
    MFCC(sample_rate=8e3, n_mfcc=13)
]))

There are many useful audio transformations in torchaudio.transforms such as torchaudio.transforms.MFCC.

load_all: bool

Whether or not to load the entire dataset into memory.

This essentially defeats the point of batching, but the dataset is relatively small enough that it can comfortably fit into memory and possibly provide some speed-up.

If this is set to True, then the complete set of raw audio recordings and labels (for the specified files) can be accessed with self.recordings and self.labels.

**args: optional

Arbitrary keyword arguments passed on to torchaudio.load().

Transformations¶

While many transformations can be applied to audio data, this package only includes a transformation for trimming silence from the start or end of each audio recording. The implementation of this transformation is exactly the same as the trimming utility on the FSDD repository, but for PyTorch tensors, and assuming a normalized signal (since torchaudio.load() automatically normalizes).

Trimming silence¶

class torchfsdd.TrimSilence(threshold)[source]¶

Removes the silence at the beginning and end of the passed audio data.

Warning

This transformation assumes that the audio is normalized.

Parameters:

threshold: float: The maximum amount of noise that is considered silence.