Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
2,018 datasets
Functioning as a 119-hour corpus of English-language earnings calls collected from global companies. It serves as a benchmark for automatic speech recognition models on real-world accented speech. The author is anton-l and it was last updated in June 2022.
Bengali audio clips form a test subset from Mozilla's Common Voice project, processed for machine learning tasks. The dataset was uploaded by Lancelot53 in July 2022, likely containing validated speech recordings for evaluation.
A preprocessed subset of the Common Voice dataset containing Bengali speech audio, uploaded by user Lancelot53 to Hugging Face in July 2022. The data has undergone trimming and other preprocessing steps. It is part of the Common Voice initiative, a global project for collecting open-source speech data.
A validation subset of the Common Voice dataset containing preprocessed audio recordings and transcripts in the Bengali language. The dataset was created by contributor Lancelot53 and last updated on the Hugging Face platform in July 2022. It is part of the Common Voice project, a Mozilla initiative for open-source speech technology.
This database was created by Nordic Language Technology for developing automatic speech recognition and dictation in Swedish. The files have been renamed to be unique and meaningful, and metadata has been converted to anonymized JSON format with UTF-8 encoding.
Bengali-language speech dataset from the Common Voice project, contributed by the author bengaliAI. The dataset was last updated on July 1, 2022. The specific number of rows, columns, and total size is unknown.
Voxceleb1 Too Short Utts contains audio segments from the original VoxCeleb1 dataset. The dataset was created by s3prl and last updated on Hugging Face in July 2022. It focuses on utterances below a certain duration threshold.
Giving access to segmented audio files and their transcriptions sourced from Emirati TV shows, podcasts, and YouTube channels. It is designed as a benchmark for Automatic Speech Recognition models for the Emirati dialect, covering categories like traditions, cars, health, games, sports, and police. The dataset was created by eabayed and last updated in May 2022.
35.9 hours of Vietnamese audio generated for text-to-speech applications. The text source is a collection of public domain novels and short stories by author Vu Trong Phung. The audio was synthesized using the Google Text-to-Speech offline engine on Android.
A 119-hour corpus of English-language earnings calls collected from global companies. It serves as a benchmark for automatic speech recognition models on real-world accented speech.
Geospatial perimeters impacted by the sound classification of national and departmental roads in the French department of Haute-Saône. The dataset was created by the Bureau de Recherches Géologiques et Minières (BRGM) following the decree DDT 70 of 04 May 2022. It was last updated on 13 May 2022.
A 2022 subset of the VoxCeleb1 dataset curated by s3prl for speaker recognition tasks. It contains a reduced number of audio clips from celebrity interviews sourced from YouTube videos. The dataset is hosted on Hugging Face.
A 2022 collection of Russian radio broadcast audio data uploaded by user mh53 to Hugging Face. The dataset is intended for automatic speech recognition tasks, as suggested by its title and platform tags.
A 2022 dataset containing Vietnamese text and corresponding audio samples for speech synthesis tasks. Created by author 'duongmle' and hosted on Hugging Face, it is categorized as containing approximately 1,000 samples based on platform size tags.
A geospatial dataset containing the sound classification of railway lines in the French department of Haute-Saône. The classification was established by Prefectural Order No 70-2019-07-03-002 on July 3, 2019. The data is served via a Web Map Service (WMS) and was last updated on May 13, 2022.
Librispeech Metadata provides descriptive information for the LibriSpeech audio corpus, a widely used benchmark in speech recognition. The metadata was uploaded by s3prl to the Hugging Face platform in June 2022. It serves as a companion to the primary audio dataset.
A collection of approximately 241 hours of high-quality Malay speech audio synthesized by the ms-MY-YasminNeural voice. The audio is split into two subsets: 99.4 hours from Malay Wikipedia and News texts, and 142 hours from Malaysian Parliament transcripts. All audio has a 24000 Hz sample rate and uses sentences between 2 and 20 words in length.
A collection of audio waveforms and corresponding transcriptions for Fongbe speech recognition. The audio data is sampled at 16,000 Hz. The dataset was created by godwinh and was last updated in May 2022.
Librispeech is a large-scale corpus of read English speech derived from audiobooks. The metadata for this dataset was uploaded by user leo19941227 to the Hugging Face platform in June 2022.
Synthesized voices derived from the Skyrim voice datasets. It was created by author Etephyr and last updated in June 2022. The specific number of audio files, features, and data size are unknown.