About¶
The Free Spoken Digit Dataset is an open data set consisting of audio recordings of various individuals speaking the digits from 0-9, with 50 recordings of each digit per individual.
The data set can be though of as an audio version of the popular MNIST data set which consists of hand-written digits. However, the fact that the data consists of recordings of different length makes it more challenging to deal with than the fixed-size images of MNIST.
Models based on recurrent neural networks that can be implemented in PyTorch are a common approach
for this task, and TorchFSDD aims to provide an interface to FSDD for such neural networks in PyTorch,
by providing a torch.utils.data.Dataset
wrapper that is ready to be used with a torch.utils.data.DataLoader
.
Using TorchFSDD