DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,013 datasets

Speech & Audio

Vocalforge Toolkit for Voice Dataset Creation

Vocalforge is a Python toolkit designed for generating synthetic voice datasets. The project, authored by rioharper on GitHub, was last updated in December 2023. It is released under the permissive MIT license.

AudioText To SpeechSpeech SynthesisVoice CloningSpeech To TextPythonArtificial IntelligenceDataset GenerationToolkitAudio ProcessingAudio GenerationSpeech Recognition+1

0 views

Speech & Audio

Whisperspeech: Semantic and Acoustic Tokens for TTS Training

Supplying semantic and acoustic tokens for the LibriLight and LibriTTS English speech corpora, specifically formatted for training SPEAR TTS-like models. It features 24kHz EnCodec acoustic tokens at 6kbps and semantic tokens generated through a Whisper tiny VQ bottleneck trained on LibriLight subsets.

TextSize Categories1 Kn10 KTask Categoriestext To SpeechLanguageenModalitytextLibrarymlcroissantLibrarydatasetsRegionusLicensemit+1

0 views

Speech & Audio

NST Danish ASR Database: 16 kHz Speech for Automatic Recognition

An upload of the NST Danish ASR Database, reorganized for use on the Hugging Face platform. The dataset is intended for training automatic speech recognition models and is available in the Danish language. The training and test splits are the original ones from the source database.

AudioBenchmarkDanish LanguageSpeech Recognition+1

0 views

Speech & Audio

Librispeech Long: Extended English Speech Audio for ASR

Librispeech Long is a speech audio dataset derived from the LibriSpeech corpus, likely containing longer-form English audio segments. The dataset was created by distil-whisper and was last updated on Hugging Face in November 2023. Its specific size, format, and license details are not provided in the available metadata.

AudioEnglishLong FormSpeech Recognition+1

0 views

Speech & Audio

Rail Noise Affected Sectors in Maine-et-Loire, France

Sectors affected by rail noise in the French department of Maine-et-Loire, determined by the Prefect under national noise control laws. The dataset is provided by the Bureau de Recherches Géologiques et Minières and was last updated on August 18, 2023. It likely contains geographic boundaries for areas where specific acoustic requirements apply for new construction.

AudioGeospatial🇫🇷 FranceEnvironmental PlanningTransport InfrastructureRail Noise+1

0 views

Speech & Audio

Freesound Datasets

Multiple human-labeled audio collections across various sound categories are hosted on this platform, utilizing content from the Freesound repository. The data is generated through a collaborative framework where users contribute to the labeling and verification of open-source audio samples.

FreesoundCrowdsourcing+1

0 views

Speech & Audio

Music Audio Pseudo Captions: Instructions for Music and Audio Tasks

LP-MusicCaps, Music Negation/Temporal Ordering, and WavCaps datasets were re-organized into instruction form by seungheondoh. The dataset was last updated on August 16, 2023. It likely contains pseudo-captions for music and audio content generated using ChatGPT.

AudioTime SeriesMultimodalPseudo Captions+1

0 views

Speech & Audio

Persian Speech Dataset for Audio Processing and Recognition

A speech dataset in the Persian language, published on the Hugging Face platform by SeyedAli and last updated on September 15, 2023. The dataset's specific content, size, and structure are not detailed in the provided metadata. Its primary modality is indicated as audio, with associated text for processing tasks.

AudioParquetSize Categories1 Kn10 KMachine LearningLibrarypolarsLibrarydaskModalitytextLibrarymlcroissantPersian LanguageLibrarydatasetsRegionusAudio ProcessingSpeech RecognitionLicensemit+1

0 views

Speech & Audio

WavCaps: ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset

WavCaps is a dataset for audio-language multimodal research, with audio clips sourced from FreeSound, BBC Sound Effects, SoundBible, and the AudioSet Strongly-labelled Subset. The dataset was created by cvssp and last updated on Hugging Face in July 2023. It uses ChatGPT to assist in generating weakly-labelled captions for the audio content.

AudioMultimodalWeakly SupervisedSound Event DetectionMultimodal Learning+1

0 views

Speech & Audio

Jazznet: Jazz Piano Patterns for Music Research

Piano patterns for jazz music audio machine learning research. The data focuses on the transcription and analysis of genre-specific piano performances. It supports the development of models for genre-specific transcription and pattern recognition.

Machine LearningData GenerationMusic Information RetrievalMachine Learning DatasetMusic DatasetDeep LearningAudio Synthesis+1

0 views

Speech & Audio

LP-MusicCaps: Large Language Model Generated Music Captions

LP-MusicCaps-MTT is a dataset of pseudo music captions generated by a Large Language Model for text-to-music and music-to-text tasks. The dataset was constructed by combining three existing multi-label tag datasets and four task-specific datasets. It was created by seungheondoh and last updated on August 4, 2023.

TextAudioMultimodalParquetSize Categories10 Kn100 KLibrarypolarsMusic CaptioningLanguageenMusic To TextModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasText To MusicRegionusArtLicensemit+1

0 views

Speech & Audio

LP-MusicCaps-MC: LLM-based Pseudo Music Captions from Multi-label Tags

Offering LLM-generated pseudo music captions derived from three multi-label tag datasets for audio-language tasks. It features music-to-caption pairs across four distinct generation tasks to support text-to-music and music-to-text model training.

AudioParquetSize Categories1 Kn10 KLibrarypolarsLanguageenMusic To TextModalitytextModalitytabularLibrarymlcroissantLibrarydatasetsLibrarypandasText To MusicRegionusArtLicensemit+1

0 views

Speech & Audio

ATCO2-ASR-ATCOSIM: Air Traffic Control Speech for Automatic Speech Recognition

A combined dataset from the ATCO2-ASR and ATCOSIM collections, likely containing air traffic control speech audio. The dataset was created by author jlvdoorn and last updated on July 7, 2023. It is split into 80% training and 20% validation partitions, with some files containing additional metadata.

AudioSpeech Recognition+1

0 views

Speech & Audio

Kinyarwanda Speech Recordings from Studio Voice Actress

A collection of 3,992 audio clips of Kinyarwanda text-to-speech recordings made by a single voice actress in a studio setting. It was collected as part of the Mbaza project and includes a CSV file linking audio file names to their corresponding written text.

AUDIOFOLDERSize Categories1 Kn10 KLanguage Creatorsdigital UmugandaLibrarymlcroissantLibrarydatasetsLicensecc By 40Regionus+1

0 views

Speech & Audio

MASC: 1,000 Hours of Multi-Dialect Arabic Speech from YouTube

1,000 hours of speech audio sampled at 16 kHz, crawled from over 700 YouTube channels. The MASC dataset is multi-regional, multi-genre, and multi-dialect, intended to advance research and development of Arabic speech technology. It was authored by 'pain' and last updated on the Hugging Face platform in June 2023.

AudioMultiregionalYoutubeMultidialectSpeech Recognition+1

0 views

Speech & Audio

Vivos: 15-Hour Vietnamese Speech Corpus

15 hours of Vietnamese speech recordings specifically curated for Automatic Speech Recognition (ASR) tasks. The corpus was developed by AILAB at VNUHCM - University of Science and includes audio data paired with corresponding transcriptions for linguistic research.

Source DatasetsoriginalSize Categories10 Kn100 KLanguage Creatorsexpert GeneratedLanguage CreatorscrowdsourcedLicensecc By Nc Sa 40RegionusTask Categoriesautomatic Speech RecognitionMultilingualitymonolingualAnnotations Creatorsexpert GeneratedLanguagevi+1

0 views

Speech & Audio

Naija Stopwords: Multilingual List for Four Nigerian Languages

Naija-Stopwords is a list of collected stopwords from the four most widely spoken languages in Nigeria — Hausa, Igbo, Nigerian-Pidgin, and Yorùbá. It is part of the Naija-Senti project and was authored by HausaNLP. The dataset was last updated on June 18, 2023.

TextStopwordsMultilingual NlpNigerian LanguagesText Processing+1

0 views

Speech & Audio

ATCOSIM: Air Traffic Control Simulation Speech Corpus

10 hours of speech recordings and transcriptions from the ATCOSIM project for Air Traffic Management. The data captures interactions between controllers and pilots during real-time simulations to support automatic speech recognition research.

ParquetSize Categories1 Kn10 KLibrarypolarsLibrarydaskModalityaudioLanguageenModalitytextAtmLibrarymlcroissantLibrarydatasetsAir Traffic ManagementRegionusNatural Language ProcessingDoi1057967hf1378AtcosimSpeech RecognitionAutomatic Speech Recognition+1

0 views

Speech & Audio

Sovits4.0 768Vec Layer12

6 pre-trained base models for SoVITS 4.0 voice conversion, featuring 768-dimensional vectors and layer 12 configurations. These models were trained on the m4singer and vctk datasets, reaching up to 320,000 training steps with loss values as low as 14.1.

Regionus+1

0 views

Speech & Audio

Hebrew Speech Audio Dataset For Automatic Speech Recognition

A dataset for Automatic Speech Recognition (ASR) containing Hebrew speech audio files. The dataset was created by author 'imvladikon' and was last updated in May 2023.

ParquetSize Categories10 Kn100 KLibrarypolarsLibrarydaskModalityaudioModalitytextLibrarymlcroissantLibrarydatasetsLanguageheRegionusTask Categoriesautomatic Speech Recognition+1

0 views

PreviousPage 87 of 101Next