DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,018 datasets

Speech & Audio

Medical Asr En: Medical Speech Audio for Automatic Speech Recognition

Medical Asr En is a dataset for automatic speech recognition in a medical context, published on the Hugging Face platform by author jarvisx17. The dataset was last updated on January 30, 2023. Its specific content, size, and structure require verification after download.

AudioMedical SpeechHealthcareAudio ProcessingAutomatic Speech RecognitionHealthcare Ai+1

0 views

Speech & Audio

Music Dance Video Synthesis

1 PyTorch implementation for self-supervised dance video synthesis across music and dance categories. The repository provides the official code for the ACM MM 20 Oral paper on generative dance video synthesis.

PytorchDanceMusic VideoVideo GenerationPerceptual LossesPretrain ModelPaperDeep LearningSkeletonsDemo VideoMultimedia+1

0 views

Speech & Audio

LibriSpeech 1000-Hour English Speech Corpus

LibriSpeech contains approximately 1000 hours of 16kHz read English speech. The corpus was prepared by Vassil Panayotov with assistance from Daniel Povey, derived from audiobooks in the LibriVox project. The dataset was uploaded to Hugging Face by nguyenvulebinh in December 2022.

AudioSpeaker IdentificationAudiobooksEnglish AudioNatural Language ProcessingSpeech Recognition+1

0 views

Speech & Audio

Open Slr108 Turkish 10 Hours

10 hours of Turkish media speech audio clips designed for evaluating Automated Speech Recognition (ASR) systems. This dataset is part of the MediaSpeech collection which also covers French, Arabic, and Spanish languages.

Arxiv210316193Licensecc By 40RegionusRobust Speech Event+1

0 views

Speech & Audio

Datasets Emotion: Music Emotion Recognition Metadata

1 repository indexing multiple datasets for Music Emotion Recognition (MER). The collection organizes metadata for various audio-based resources to facilitate research in affective musicology. It provides a centralized point of access for datasets involving musical audio and emotional labels.

EmotionsMusic+1

0 views

Speech & Audio

Carnegie Library of Pittsburgh Public Wifi Usage Data

2023 data from the Carnegie Library of Pittsburgh details public wifi usage across its library locations. The dataset is provided by the Allegheny County / City of Pittsburgh / Western PA Regional Data Center. Specific row and column counts are unknown.

WifiInternetLibrariesLibrary+1

0 views

Speech & Audio

SMAPVEX19-22: Plant Area Index and RGB Images from Massachusetts Forest

SMAPVEX19-22 field campaign data includes plant area index (PAI) values and the RGB images used to derive them. The data were collected between April 2019 and December 2022 near Petersham, Massachusetts. The NSIDC_CPRD organization produced this dataset to support validation of satellite-derived soil moisture estimates in forested areas.

ImageGeospatialComputer VisionSoil MoisturePlant area indexField CampaignForested Land Cover+1

0 views

Speech & Audio

FMA: Free Music Archive

106,574 tracks from 16,341 artists across 161 genre categories. The collection includes 917 GiB of audio data, pre-computed features like MFCCs, and metadata tables linking tracks to albums and artists.

Music Information RetrievalOpen ScienceMusic AnalysisOpen DataReproducible ResearchDeep Learning+1

0 views

Speech & Audio

Tamil Speech Recognition Corpus With 1000 Hours

Approximately 1000 hours of Tamil audio paired with transcripts. The transcripts have been de-duplicated using exact match deduplication. The dataset was created by parambharat and last updated in December 2022.

TextAudioLanguage CreatorsfoundMonolingual CorpusSource DatasetsextendedopenslrSize Categories100 Kn1 MLicensecc By 40Annotations CreatorsfoundRegionusNatural Language ProcessingTask Categoriesautomatic Speech RecognitionTamil LanguageMultilingualitymonolingualSpeech RecognitionAudio Transcripts+1

0 views

Speech & Audio

Pittsburgh Political Ward Boundaries Map

A map resource for City of Pittsburgh political wards, maintained by Allegheny County and the Western PA Regional Data Center. It provides geographic boundaries for local administrative and electoral divisions. The dataset was last updated in January 2023.

PittsburghWardsMap+1

0 views

Speech & Audio

Persian Emotional Speech from Radio Plays

The Sharif Emotional Speech Dataset contains 3000 semi-natural utterances, equivalent to 3 hours and 25 minutes of speech data. It covers five basic emotions and a neutral state, labeled by 12 annotators from speech samples of 87 native-Persian speakers extracted from online radio plays.

Size Categories1 Kn10 KLanguage Creatorsexpert GeneratedLanguagefaRegionusTask Categoriesautomatic Speech RecognitionSource Datasetsradio PlaysMultilingualitymonolingualLicenseapache 20Annotations Creatorsexpert Generated+1

0 views

Speech & Audio

Romanian Speech Synthesis 0 8 1

4,000 Romanian sentences recorded across 8 sessions by a single speaker in a hemianechoic chamber. Audio was captured at 96 kHz/24-bit and downsampled to 48 kHz using a Sennheiser MKH 800 small diaphragm condenser microphone.

LicenseunknownRegionusTask Categoriesautomatic Speech RecognitionLanguagero+1

0 views

Speech & Audio

NPSC: Norwegian Parliament Speech Corpus (Test Set)

Audio recordings and orthographic transcriptions from the Norwegian Parliament categorized into Norwegian Bokmål and Norwegian Nynorsk written standards. The corpus serves as a benchmark for Norwegian Automatic Speech Recognition (ASR) systems using official parliamentary proceedings.

AUDIOFOLDERSource DatasetsoriginalModalityaudioLanguage CreatorsfoundLicensecc0 10LanguagenoSize Categoriesn1 KAnnotations Creatorsno AnnotationLibrarymlcroissantTask Categoriesaudio ClassificationLibrarydatasetsLanguagennRegionusSpeech ModelingTask Categoriesautomatic Speech RecognitionMultilingualitymonolingualLanguagenb+1

0 views

Speech & Audio

Library of Congress Music Catalog Records

Library of Congress provides music catalog data in XML format, last updated in December 2022. The dataset contains bibliographic records for musical works. Specific row counts, column features, and size details are unavailable.

Music+1

0 views

Speech & Audio

Sound Classification On Raspberry Pi With Tensorflow

Audio signal recordings and MLP neural network configurations for sound classification on edge devices. It provides training components for exporting models to Raspberry Pi 2 or superior hardware using USB microphone inputs.

Machine LearningMultilayer Perceptron NetworkTensorflowLibrosaRaspberryAudio AnalysisTensorflow ModelsSound ClassificationRaspberry PiAudio Signals+1

0 views

Speech & Audio

Telugu ASR: Speech Recognition Data for the Telugu Language

A speech recognition dataset for the Telugu language, published on the Hugging Face platform. The dataset was uploaded by author 'bnriiitb' and was last updated on November 22, 2022. The specific content, size, and structure of the audio files are not detailed in the available metadata.

AudioTelugu LanguageAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

M Ailabs Speech Dataset Fr

1,000 hours of audio recordings and transcriptions derived from LibriVox and Project Gutenberg for speech recognition and synthesis. The collection features French audio clips between 1 and 20 seconds in length paired with literary texts published from 1884 to 1964.

RegionusLanguagefrTask Categoriesautomatic Speech RecognitionLicensecc+1

0 views

Speech & Audio

Music Genres Audio Classification Dataset

For audio classification tasks related to music genres. It was created by lewtun and last updated on November 2, 2022. The specific number of rows, columns, and audio features is unknown.

ParquetSize Categories10 Kn100 KLibrarypolarsLibrarydaskModalitytextLibrarymlcroissantLibrarydatasetsRegionus+1

0 views

Speech & Audio

Earnings22 Baseline 5 Gram: 119 Hours of Accented Earnings Calls

A 119-hour corpus of English-language earnings calls collected from global companies. The dataset was created by anton-l and uploaded to Hugging Face in October 2022. Its primary purpose is to serve as a benchmark for automatic speech recognition models on real-world accented speech.

TextAudioBenchmarkEarnings CallsNatural Language ProcessingAccented SpeechSpeech Recognition+1

0 views

Speech & Audio

Italian Female Speech Audio for Text-to-Speech Training

8 hours and 23 minutes of Italian speech audio from a single female speaker, recorded at a 16000Hz sample rate. It is derived from female audio segments found in the M-AILABS Speech Dataset and adapted as an Italian version of LJSpeech for training text-to-speech models.

LanguageitRegionusMultilingualitymonolingual+1

0 views

PreviousPage 89 of 101Next