DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,018 datasets

Speech & Audio

English Earnings Calls Corpus With Global Accents

Functioning as a 119-hour corpus of English-language earnings calls collected from global companies. It serves as a benchmark for automatic speech recognition models on real-world accented speech. The author is anton-l and it was last updated in June 2022.

ModalityaudioSize Categoriesn1 KModalitytextLibrarymlcroissantLicensecc By Sa 40LibrarydatasetsRegionus+1

0 views

Speech & Audio

Bengali Speech Test Set from Mozilla Common Voice

Bengali audio clips form a test subset from Mozilla's Common Voice project, processed for machine learning tasks. The dataset was uploaded by Lancelot53 in July 2022, likely containing validated speech recordings for evaluation.

AudioTime SeriesParquetSize Categories1 Kn10 KLibrarypolarsLibrarydaskAudio ClassificationModalitytimeseriesLibrarymlcroissantLanguage TestingLibrarydatasetsRegionusSpeech Recognition+1

0 views

Speech & Audio

Bengali Speech Audio from Common Voice with Preprocessing

A preprocessed subset of the Common Voice dataset containing Bengali speech audio, uploaded by user Lancelot53 to Hugging Face in July 2022. The data has undergone trimming and other preprocessing steps. It is part of the Common Voice initiative, a global project for collecting open-source speech data.

AudioParquetSize Categories10 Kn100 KLibrarypolarsLibrarydaskBengali LanguageModalitytimeseriesAudio PreprocessingLibrarymlcroissantLibrarydatasetsRegionusOpen DataSpeech Recognition+1

0 views

Speech & Audio

Bengali Speech Validation Data from Common Voice

A validation subset of the Common Voice dataset containing preprocessed audio recordings and transcripts in the Bengali language. The dataset was created by contributor Lancelot53 and last updated on the Hugging Face platform in July 2022. It is part of the Common Voice project, a Mozilla initiative for open-source speech technology.

TabularAudioParquetSize Categories1 Kn10 KLibrarypolarsLibrarydaskBengali LanguageModalitytimeseriesAudio ValidationLibrarymlcroissantLibrarydatasetsRegionusVoice DataSpeech Recognition+1

0 views

Speech & Audio

Swedish Speech Recognition Database with Renamed Files

This database was created by Nordic Language Technology for developing automatic speech recognition and dictation in Swedish. The files have been renamed to be unique and meaningful, and metadata has been converted to anonymized JSON format with UTF-8 encoding.

Regionus+1

0 views

Speech & Audio

Bengali Speech Data from Common Voice Project

Bengali-language speech dataset from the Common Voice project, contributed by the author bengaliAI. The dataset was last updated on July 1, 2022. The specific number of rows, columns, and total size is unknown.

Licensecc0 10Regionus+1

0 views

Speech & Audio

VoxCeleb1 Short Utterances for Speaker Recognition

Voxceleb1 Too Short Utts contains audio segments from the original VoxCeleb1 dataset. The dataset was created by s3prl and last updated on Hugging Face in July 2022. It focuses on utterances below a certain duration threshold.

AudioSpeaker VerificationAudio ClassificationSpeech ProcessingRegionusCelebrity Voices+1

0 views

Speech & Audio

Emirati Dialect Audio Transcriptions from TV Shows and Podcasts

Giving access to segmented audio files and their transcriptions sourced from Emirati TV shows, podcasts, and YouTube channels. It is designed as a benchmark for Automatic Speech Recognition models for the Emirati dialect, covering categories like traditions, cars, health, games, sports, and police. The dataset was created by eabayed and last updated in May 2022.

AUDIOFOLDERModalityaudioSize Categoriesn1 KLibrarymlcroissantLibrarydatasetsRegionusLicenseafl 30+1

0 views

Speech & Audio

Vietnamese Text-To-Speech Audio From Literary Works

35.9 hours of Vietnamese audio generated for text-to-speech applications. The text source is a collection of public domain novels and short stories by author Vu Trong Phung. The audio was synthesized using the Google Text-to-Speech offline engine on Android.

AUDIOFOLDERModalityaudioSize Categoriesn1 KModalitytextLibrarymlcroissantLibrarydatasetsLicensecc By Nc 40Regionus+1

0 views

Speech & Audio

English Earnings Calls Corpus for ASR Benchmarking

A 119-hour corpus of English-language earnings calls collected from global companies. It serves as a benchmark for automatic speech recognition models on real-world accented speech.

ModalityaudioSize Categoriesn1 KModalitytextLibrarymlcroissantLicensecc By Sa 40LibrarydatasetsRegionus+1

0 views

Speech & Audio

DDT 70: Noise Classification Perimeters for National and Departmental Roads in Haute-Saône

Geospatial perimeters impacted by the sound classification of national and departmental roads in the French department of Haute-Saône. The dataset was created by the Bureau de Recherches Géologiques et Minières (BRGM) following the decree DDT 70 of 04 May 2022. It was last updated on 13 May 2022.

AudioGeospatial🇫🇷 FranceRoad TrafficEnvironmental RegulationNoise Pollution+1

0 views

Speech & Audio

Mini VoxCeleb1 Speaker Recognition Dataset

A 2022 subset of the VoxCeleb1 dataset curated by s3prl for speaker recognition tasks. It contains a reduced number of audio clips from celebrity interviews sourced from YouTube videos. The dataset is hosted on Hugging Face.

AudioAUDIOFOLDERVoice BiometricsModalityaudioSpeaker VerificationAudio ClassificationSize Categoriesn1 KSpeaker IdentificationLibrarymlcroissantLibrarydatasetsRegionus+1

0 views

Speech & Audio

Russian Radio Broadcasts for Speech Recognition

A 2022 collection of Russian radio broadcast audio data uploaded by user mh53 to Hugging Face. The dataset is intended for automatic speech recognition tasks, as suggested by its title and platform tags.

AudioRussian LanguageRegionusLicenseccSpeech RecognitionRadio Broadcast+1

0 views

Speech & Audio

Vietnamese Text-To-Speech Audio Dataset

A 2022 dataset containing Vietnamese text and corresponding audio samples for speech synthesis tasks. Created by author 'duongmle' and hosted on Hugging Face, it is categorized as containing approximately 1,000 samples based on platform size tags.

TextAudioWEBDATASETText To SpeechModalityaudioLibrarywebdatasetSpeech SynthesisAudio SamplesSize Categoriesn1 KModalitytextLibrarymlcroissantLibrarydatasetsRegionusVietnamese Language+1

0 views

Speech & Audio

DDT 70: Railway Sound Classification for Haute-Saône Department

A geospatial dataset containing the sound classification of railway lines in the French department of Haute-Saône. The classification was established by Prefectural Order No 70-2019-07-03-002 on July 3, 2019. The data is served via a Web Map Service (WMS) and was last updated on May 13, 2022.

AudioGeospatial🇫🇷 FranceTransportationEnvironmental ClassificationRailway Noise+1

0 views

Speech & Audio

Librispeech Audio Metadata

Librispeech Metadata provides descriptive information for the LibriSpeech audio corpus, a widely used benchmark in speech recognition. The metadata was uploaded by s3prl to the Hugging Face platform in June 2022. It serves as a companion to the primary audio dataset.

AudioMachine LearningRegionusAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Malay Speech Synthesis Dataset with 241 Hours of Audio

A collection of approximately 241 hours of high-quality Malay speech audio synthesized by the ms-MY-YasminNeural voice. The audio is split into two subsets: 99.4 hours from Malay Wikipedia and News texts, and 142 hours from Malaysian Parliament transcripts. All audio has a 24000 Hz sample rate and uses sentences between 2 and 20 words in length.

Regionus+1

0 views

Speech & Audio

Fongbe Speech Recognition Audio and Text Dataset

A collection of audio waveforms and corresponding transcriptions for Fongbe speech recognition. The audio data is sampled at 16,000 Hz. The dataset was created by godwinh and was last updated in May 2022.

RegionusLicenseapache 20+1

0 views

Speech & Audio

Librispeech Audio Transcription Metadata

Librispeech is a large-scale corpus of read English speech derived from audiobooks. The metadata for this dataset was uploaded by user leo19941227 to the Hugging Face platform in June 2022.

TextAudioRegionusSpeech RecognitionSpoken Language+1

0 views

Speech & Audio

Synthesized Voices from Skyrim Voice Datasets

Synthesized voices derived from the Skyrim voice datasets. It was created by author Etephyr and last updated in June 2022. The specific number of audio files, features, and data size are unknown.

RegionusLicensemit+1

0 views

PreviousPage 93 of 101Next