DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,018 datasets

Speech & Audio

SMAPVEX19-22: Airborne Lidar Measurements of Forested Areas in Massachusetts

SMAPVEX19-22 Massachusetts Airborne Lidar V001 contains lidar measurements collected by the NSIDC_CPRD organization. The data was gathered in April and August 2022 near Petersham, Massachusetts, as part of a campaign to validate satellite-derived soil moisture estimates. The two acquisition periods were selected to capture differences in forest conditions during "leaf-off" and "leaf-on" seasons.

GeospatialPoint CloudForest ecologySatellite ValidationSoil Moisture+1

0 views

Speech & Audio

SMAPVEX19-22 Lidar Elevation Model for Petersham, MA in 2022

April and August 2022 ground surface elevations derived from lidar measurements collected near Petersham, Massachusetts. These data were gathered during the SMAPVEX19-22 campaign to validate satellite-derived soil moisture estimates in forested areas. The two acquisition periods characterize differences during 'leaf-off' and 'leaf-on' conditions.

GeospatialPoint CloudSoil Moisture ValidationDigital Elevation ModelForested Land Cover+1

0 views

Speech & Audio

Clean Speech Corpus for Text-to-Speech Synthesis

Cv Tts Clean is a speech dataset for text-to-speech applications, created by neongeckocom and uploaded to Hugging Face in September 2022. The dataset name suggests it contains clean audio recordings, likely paired with corresponding text transcripts. Its BSD 3-Clause license and US region tag indicate permissible use and a primary geographic source.

AudioText To SpeechSpeech SynthesisSpeech CorpusLicensebsd 3 ClauseRegionusClean Audio+1

0 views

Speech & Audio

Icelandic University Lectures With Audio And Text

Kennslurómur is a collection of audio recordings and corresponding text from instructional lectures recorded in courses at the University of Reykjavík and the University of Iceland. The dataset is intended for training speech recognition models, with recordings provided by lecturers, processed by a speech recognizer, and subsequently proofread by students and a professional proofreader.

Regionus+1

0 views

Speech & Audio

2021 Punctuation Restoration

2021 collection of Polish language text samples categorized for punctuation restoration tasks within Automatic Speech Recognition (ASR) workflows. The dataset provides unpunctuated transcriptions paired with their punctuated versions to facilitate the training of sequence labeling models.

Language CreatorscrowdsourcedSize Categoriesn1 KArxiv200800702Annotations CreatorscrowdsourcedRegionusTask Categoriesautomatic Speech RecognitionLanguageplMultilingualitymonolingualArxiv200400248+1

0 views

Speech & Audio

Western Classical Guitar Music Tokenized from MIDI

Comprising solo guitar pieces from the Mutopia Project, encoded as text tokens from MIDI files. It primarily features music by western classical composers such as Sor, Aguado, Carcassi, and Giuliani. The dataset is intended for language modeling and text generation tasks.

TextSize Categories1 Kn10 KTask Categoriestext GenerationMultilingualityother MusicArxiv200806048ModalitytextLibrarymlcroissantLibrarydatasetsTask Idslanguage ModelingRegionusLicensecc+1

0 views

Speech & Audio

Text Data from US Region with Multiple Library Support

Text data, as indicated by the 'Modalitytext' tag. It is associated with the US region and supports multiple data processing libraries including polars, dask, and datasets. The dataset was last updated on August 29, 2022.

JSONLibrarypolarsLibrarydaskModalitytextSize Categories100 Kn1 MLibrarymlcroissantLibrarydatasetsRegionus+1

0 views

Speech & Audio

Asrs Aviation Reports

47,723 aviation incident reports sourced from NASA's Aviation Safety Reporting System (ASRS) database. Each entry pairs a detailed narrative account of a safety event with a corresponding summary suitable for text generation tasks.

JSONSource DatasetsoriginalSize Categories10 Kn100 KLibrarypolarsLanguageenTask CategoriessummarizationModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasRegionusLanguage CreatorsotherMultilingualitymonolingualLicenseapache 20Annotations Creatorsexpert Generated+1

0 views

Speech & Audio

Vietnamese Inverse Text Normalization Data for ASR

This dataset supports Vietnamese Inverse Text Normalization (ITN), a task that transforms spoken-style text to written form, particularly for improving Automatic Speech Recognition (ASR) output readability. It was created by VietAI and last updated in July 2022.

ParquetLibrarypolarsLibrarydaskModalitytextSize Categories100 Kn1 MLibrarymlcroissantLibrarydatasetsRegionus+1

0 views

Speech & Audio

MTG-Jamendo: Music Auto-Tagging Dataset

55,000 full audio tracks categorized by 195 tags across genre, instrument, and mood/theme classes. The data is sourced from Jamendo under Creative Commons licenses and includes tags provided by original content creators.

Source DatasetsoriginalSize Categories10 Kn100 KRegionusLicenseapache 20+1

0 views

Speech & Audio

Multilingual Speech Corpus with 12,800 Audio Samples

Featuring 12,800 balanced audio samples in WAV format and related transcriptions from 18 speakers. It is assembled from multiple sources including VCTK, LJSpeech, m-ailabs, and SIWIS, covering languages such as English, French, German, Luxembourgish, and Portuguese.

LanguageenLanguagelbLicensecc By Nc Sa 40RegionusLanguagefrLanguageptLanguagede+1

0 views

Speech & Audio

MASC: Massive Arabic Speech Corpus

1,000 hours of Arabic speech audio sampled at 16 kHz, collected from over 700 YouTube channels. The data spans multiple regions, genres, and dialects to support the development of speech recognition technologies.

ParquetLibrarypolarsLanguagearLanguage CreatorscrowdsourcedLibrarydaskModalitytimeseriesSize Categoriesn1 KLibrarymlcroissantLibrarydatasetsLicensecc By Nc 40Annotations CreatorscrowdsourcedRegionus+1

0 views

Speech & Audio

Malay Text-to-Speech Voice Model for Yasmin

Azure Tts Yasmin is a text-to-speech voice model for the Malay language, created by mesolitica and uploaded to Hugging Face in August 2022. The model is associated with the 'Regionus' tag, suggesting a regional focus. Specific details on dataset size, audio samples, or training methodology are not provided in the available metadata.

AudioText To SpeechSpeech SynthesisVoice CloningMalay LanguageRegionus+1

0 views

Speech & Audio

Vietnamese Speech Audio With Text Transcripts

Featuring 9.5 hours of Vietnamese speech audio paired with text transcripts, totaling 1.28GB. The audio was crawled from YouTube audiobooks, and the text was labeled by VinBrain JSC.

RegionusLicensemit+1

0 views

Speech & Audio

Korean Single Speaker Speech Dataset for Text-to-Speech

The KSS Dataset is a Korean text-to-speech dataset consisting of audio files recorded by a professional female voice actress, with aligned text extracted from books. The dataset is the first publicly available speech dataset for Korean, released by the copyright holder.

ParquetSource DatasetsoriginalSize Categories10 Kn100 KLanguage Creatorsexpert GeneratedTask Categoriestext To SpeechLibrarypolarsLibrarydaskModalityaudioLicensecc By Nc Sa 40ModalitytextLibrarymlcroissantLibrarydatasetsLanguagekoRegionusMultilingualitymonolingualAnnotations Creatorsexpert Generated+1

0 views

Speech & Audio

Malay Wikipedia Text-to-Speech Audio Dataset

Azure Tts Osman Wikipedia is a text-to-speech dataset created by mesolitica, likely containing synthesized audio for Malay language Wikipedia articles. The dataset was last updated on July 31, 2022. It is hosted on the Hugging Face platform and is associated with text modality tags.

TextAudioJSONText To SpeechLibrarypolarsSpeech SynthesisModalitytextSize Categories100 Kn1 MLibrarymlcroissantLibrarydatasetsLibrarypandasMalay LanguageWikipediaRegionus+1

0 views

Speech & Audio

Malay Wikipedia Audio Synthesis with Azure TTS

Azure TTS Yasmin Wikipedia contains speech audio generated from Wikipedia text using Microsoft Azure's text-to-speech technology. The dataset was created by the user 'mesolitica' and uploaded to Hugging Face in July 2022. It is categorized as containing over 100,000 rows of data.

TextAudioJSONText To SpeechLibrarypolarsSpeech SynthesisModalitytextSize Categories100 Kn1 MLibrarymlcroissantLibrarydatasetsLibrarypandasMalay LanguageWikipediaRegionus+1

0 views

Speech & Audio

Malay Text-to-Speech Voice Model for Osman

Azure Tts Osman is a text-to-speech model for the Malay language, created by the author mesolitica. The model was uploaded to the Hugging Face platform in July 2022. It is tagged for regional use, indicating a focus on specific linguistic or acoustic characteristics.

AudioText To SpeechSpeech SynthesisVoice CloningMalay LanguageRegionus+1

0 views

Speech & Audio

Tts Recipes: Training Configurations for Text-to-Speech Models

A collection of configuration files and training scripts for developing Text-to-Speech models across various public speech datasets. It includes specific parameters for audio preprocessing, model architectures, and training schedules tailored to diverse audio-text corpora.

Text To SpeechCoqui AiRecipeTts RecipesSpeechDeep Learning+1

0 views

Speech & Audio

GTZAN Audio Collection with 10 Music Genres

Containing 1000 audio tracks, each 30 seconds long. It includes 10 music genres, with 100 tracks per genre, and provides both raw WAV files and 8000 derived Mel Spectrograms. The audio files are 22050Hz Mono 16-bit WAVs.

RegionusLicenseapache 20+1

0 views

PreviousPage 92 of 101Next