DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,013 datasets

Speech & Audio

Naija Stopwords: Multilingual List for Four Nigerian Languages

Naija-Stopwords is a list of collected stopwords from the four most widely spoken languages in Nigeria — Hausa, Igbo, Nigerian-Pidgin, and Yorùbá. It is part of the Naija-Senti project and was authored by HausaNLP. The dataset was last updated on June 18, 2023.

TextStopwordsMultilingual NlpNigerian LanguagesText Processing+1

0 views

Speech & Audio

ATCOSIM: Air Traffic Control Simulation Speech Corpus

10 hours of speech recordings and transcriptions from the ATCOSIM project for Air Traffic Management. The data captures interactions between controllers and pilots during real-time simulations to support automatic speech recognition research.

ParquetSize Categories1 Kn10 KLibrarypolarsLibrarydaskModalityaudioLanguageenModalitytextAtmLibrarymlcroissantLibrarydatasetsAir Traffic ManagementRegionusNatural Language ProcessingDoi1057967hf1378AtcosimSpeech RecognitionAutomatic Speech Recognition+1

0 views

Speech & Audio

Sovits4.0 768Vec Layer12

6 pre-trained base models for SoVITS 4.0 voice conversion, featuring 768-dimensional vectors and layer 12 configurations. These models were trained on the m4singer and vctk datasets, reaching up to 320,000 training steps with loss values as low as 14.1.

Regionus+1

0 views

Speech & Audio

Hebrew Speech Audio Dataset For Automatic Speech Recognition

A dataset for Automatic Speech Recognition (ASR) containing Hebrew speech audio files. The dataset was created by author 'imvladikon' and was last updated in May 2023.

ParquetSize Categories10 Kn100 KLibrarypolarsLibrarydaskModalityaudioModalitytextLibrarymlcroissantLibrarydatasetsLanguageheRegionusTask Categoriesautomatic Speech Recognition+1

0 views

Speech & Audio

Audio Event Recordings For DCASE 2022 Task 3

Featuring audio files for DCASE 2022 - Task 3, sourced from the AudioSet ontology. The included labels are limited to a subset of sound events, such as female speech, male speech, clapping, and telephone sounds.

Language CreatorsunknownSource DatasetsunknownAnnotations CreatorsunknownTask Categoriesaudio ClassificationLicensecc By Sa 40RegionusAudio Slot Filling+1

0 views

Speech & Audio

Pittsburgh Police Arrest Records with Geographic Detail

2023 data from the City of Pittsburgh Police documents arrests for offenses including felonies, parole violations, and failures to appear for trial. Information is reported at the block or intersection level, except for sex crimes which are aggregated to the police zone level. The dataset excludes incidents handled solely by other police departments operating within the city.

PoliceArrestPublic SafetyEtlFelonyParole ViolationFailure To Appear For TrialCustodyOffenses+1

0 views

Speech & Audio

Khmer Speech Corpus With 10.4 Hours of Audio

10.4 hours of Khmer speech audio with a mean duration of 2.5 seconds per sample, compiled by author seanghay and last updated in May 2023. It contains audio clips ranging from 0.45 to 19.39 seconds, sampled at 16 kHz. The dataset is hosted on Hugging Face and is associated with text-to-speech and automatic speech recognition tasks.

TextAudioParquetSize Categories10 Kn100 KTask Categoriestext To SpeechLibrarypolarsAudio DatasetLibrarydaskKhmer LanguageSpeech SynthesisModalitytextLibrarymlcroissantSpeech CorpusLibrarydatasetsLicensecc By 40RegionusNatural Language ProcessingTask Categoriesautomatic Speech RecognitionSpeech Recognition+1

0 views

Speech & Audio

Dog Licenses in Allegheny County Excluding Pittsburgh

Allegheny County dog license records include license dates, breeds, names, and zip codes. This dataset does not contain data for dogs within the City of Pittsburgh. The row count, column count, and specific temporal coverage are not provided in the input.

Dog LicensesLicenseBreedEtlDogs+1

0 views

Speech & Audio

Music Berkeley Emotions: Audio Data for Affective Computing

A dataset for music emotion recognition and affective computing, sourced from the Hugging Face platform. It was created by author akhmedsakip and last updated in May 2023.

ParquetSize Categories1 Kn10 KLibrarypolarsLibrarymlcroissantLibrarydatasetsLibrarypandasRegionus+1

0 views

Speech & Audio

Building Permits Issued by Pittsburgh Department of Permits Licenses and Inspections

A summary of building permits issued by the City of Pittsburgh's Department of Permits Licenses and Inspections (PLI). The dataset was last updated in May 2023. The specific number of records and features is unknown.

BuildingDepartment Of Permits Licenses And InspectionsPittsburghPliPermit+1

0 views

Speech & Audio

Music Examples with English Aspect Lists and Captions

The MusicCaps dataset contains 5,521 music examples. Each example is labeled with an English aspect list and a free-text caption written by musicians.

CSVSize Categories1 Kn10 KTask Categoriestext To SpeechLibrarypolarsLanguageenModalitytextModalitytabularLibrarymlcroissantLicensecc By Sa 40LibrarydatasetsLibrarypandasRegionus+1

0 views

Speech & Audio

Telugu ASR Corpus: Speech Recognition Audio Data

Telugu_ASR_corpus is a dataset for automatic speech recognition in the Telugu language, authored by eswardivi. The dataset was last updated on Hugging Face on April 10, 2023. Specific details on size, format, and collection methodology are not provided in the available metadata.

AudioTeluguNatural Language ProcessingAudio CorpusSpeech Recognition+1

0 views

Speech & Audio

Indoor Scene Classification

Image data categorized into over 34 indoor scene classes including specialized environments like 'studiomusic', 'hospitalroom', and 'inside_bus'. It provides labeled examples for computer vision tasks focused on identifying specific architectural and functional interior spaces.

Size Categories10 Kn100 KModalitytextPest ControlLibrarymlcroissantModalityimageLibrarydatasetsBenchmarkTask Categoriesimage ClassificationRegionusRoboflow2huggingfaceRoboflowRetail+1

0 views

Speech & Audio

Multilingual Speech and Text Alignments from Bloom Library

Bloom-speech is a dataset of text-aligned speech audio sourced from bloomlibrary.org, containing over 50 languages including many low-resource ones. It is intended for training and testing speech-to-text or text-to-speech models. The dataset was created by sil-ai and was last updated in February 2023.

Source DatasetsoriginalLanguagebiLanguage Creatorsexpert GeneratedTask Categoriestext To SpeechLanguagebozLanguagecebLanguagebamLanguageajzLanguagechpLanguagebjnLanguagecloLanguagecakMultilingualitymultilingualLanguagebmTask Categoriesautomatic Speech RecognitionLanguagebzeLanguagebziLanguagebisAnnotations Creatorsexpert GeneratedLanguagechd+1

0 views

Speech & Audio

Polish Parliamentary Speeches Audio Collection

Comprising 97 hours of parliamentary speeches from Poland. The audio is stored in .wav format and was published on the ClarinPL website.

Source DatasetsoriginalSize Categories1 Kn10 KLicenseotherTask CategoriesotherRegionusTask Categoriesautomatic Speech RecognitionLanguageplMultilingualitymonolingualAnnotations Creatorsexpert Generated+1

0 views

Speech & Audio

Pittsburgh 311 Service Request Records

311 Data contains service requests for the City of Pittsburgh, collected by the 311 Response Center. Requests originate from phone calls, tweets, emails, a city website form, and a mobile application. The dataset was last updated on January 24, 2023.

Permits3-1-1Service RequestsPavingPotholesEtl+1

0 views

Speech & Audio

Brazilian Portuguese Speech Recognition Corpus with 290 Hours

CORAA v1.1 contains 290.77 hours of Brazilian Portuguese audio with transcriptions, segmented into over 400,000 audio files. The dataset is compiled from five distinct speech projects, including academic recordings and TEDx talks, and is validated for automatic speech recognition research.

LicenseunknownRegionusArxiv211015731+1

0 views

Speech & Audio

Audio Data Pytorch

Multiple audio datasets and signal transforms categorized for the PyTorch deep learning framework. The resource provides standardized data structures for audio files and preprocessing functions to support acoustic model development.

PytorchArtifical IntelligenseAudio GenerationDeep Learning+1

0 views

Speech & Audio

Environmental Sound Classification Dataset With 50 Classes

ESC-50 is a labeled collection of 2,000 environmental audio recordings. It contains 50 distinct sound classes, each with 40 examples, created by K. J. Piczak. The dataset was published for the 23rd ACM Multimedia Conference in 2015.

AudioParquetSize Categories1 Kn10 KMachine LearningLibrarypolarsLibrarydaskEnvironmental SoundAudio ClassificationModalitytextLibrarymlcroissantLibrarydatasetsRegionus+1

0 views

Speech & Audio

Medical Asr En: Medical Speech Audio for Automatic Speech Recognition

Medical Asr En is a dataset for automatic speech recognition in a medical context, published on the Hugging Face platform by author jarvisx17. The dataset was last updated on January 30, 2023. Its specific content, size, and structure require verification after download.

AudioMedical SpeechHealthcareAudio ProcessingAutomatic Speech RecognitionHealthcare Ai+1

0 views

PreviousPage 88 of 101Next