DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,571 datasets

Speech & Audio

Kalkalpen Chamber Music Festival Program

A document about the Kalkalpen Chamber Music Festival, described as a musical fireworks. The dataset is published on the eu_open_data platform by Cooperation OGD Österreich and Wikimedia Österreich under a CC-BY-4.0 license. The file format is PDF.

TextAudioFestivalCultural EventsPerforming ArtsChamber Music+1

0 views

Speech & Audio

Emotional Responses to Environmental Sounds and Self-Reported Well-Being

130 participants rated 120 environmental sounds on valence and arousal and completed well-being questionnaires. The study included 40 hearing aid users and 90 individuals with self-reported normal hearing, the latter split into subgroups with no-to-minimal and some hearing difficulties. Dina Lelic authored this dataset, last updated on 2026-05-25.

TabularAudioPsychoacousticsAffective ScienceHearing DifficultiesEnvironmental SoundsWell-being+1

0 views

Speech & Audio

Egyptian Arabic Lectures Dataset with 30 Hours of Transcribed Educational Audio

Egyptian Arabic Lectures is a dataset of transcribed audio clips from educational lectures, containing around 30 hours of speech. It is designed for training and evaluating Automatic Speech Recognition models for the Egyptian dialect, particularly in academic contexts involving technical terms. The dataset was created by ismaeeelxd and was last updated on 2026-06-21.

TextAudioEgyptian-ArabicEducational LecturesMultilingual SpeechSpeech Recognition+1

0 views

Speech & Audio

PersianDatasets2: Structured Persian Data for Instruction Tuning of Large Language Models

PersianDatasets2 is a Persian-language dataset designed for training conversational artificial intelligence models. The dataset is structured for instruction tuning and supervised fine-tuning, with examples formatted as Instruction → Input → Output. It was created by author kasranaqhdpur and last updated on Hugging Face on 2026-07-11.

TextConversational AiPersian LanguageLlm TrainingInstruction Tuning+1

0 views

Speech & Audio

Kartoffelphon-2.5M-de-ger: German Speech Dataset for TTS Models

Approximately 2.5 million audio-text snippets and an estimated 7,000 hours of speech, primarily in German, built as foundation data for Kartoffel TTS models. The dataset is composed mostly of CC / CC-BY based podcast audio. It was created by MultiLlasa and last updated on 2026-06-16.

AudioMultimodalText To SpeechGerman LanguageSpeech SynthesisPodcastLarge ScalePodcast Audio+1

0 views

Speech & Audio

Dialectra Yoruba Speech Corpus: Transcribed Recordings from Native Speakers

Dialectra Yoruba Speech Corpus v1 is an open speech dataset collected from native Yoruba speakers across multiple regions and dialect communities. The dataset was created by Dialectra to support the development of speech technologies for Yoruba, including Automatic Speech Recognition and Text-to-Speech. The dataset page was last updated on 2026-06-25.

AudioDialect ResearchLanguage TechnologySpeech CorpusYoruba LanguageNatural Language ProcessingAutomatic Speech Recognition+1

0 views

Speech & Audio

Tajik Asr Corpus V3: 1,071 Hours of Tajik Speech for ASR Training

1,071 hours of Tajik speech data form this corpus for automatic speech recognition. It combines machine-labeled audio from 41 YouTube channels with gold-standard transcriptions from the FLEURS benchmark. The dataset was created by Peacockery and was last updated on June 12, 2026.

AudioMachine LearningNatural Language ProcessingTajik LanguageAudio CorpusSpeech Recognition+1

0 views

Speech & Audio

ArA-DF-2026: Arabic Speech Deepfake Detection Audio Dataset

ArA-DF-2026 is an Arabic speech deepfake detection dataset for binary audio classification, distinguishing between spoofed/synthetic and bona fide speech. The dataset includes labeled train and development splits, with unlabeled development-test and final-test splits scored externally via CodaBench. It was created by ArabicSpeech and last updated on June 16, 2026.

AudioArabic SpeechAudio ClassificationSpeech Deepfake DetectionSynthetic+1

0 views

Speech & Audio

FalAR-TTS: European Portuguese Speech Recordings from 2016-2017

FalAR-TTS is a subset of the FalAR dataset tailored for speech synthesis in European Portuguese. The subset is restricted to utterances recorded between 2016 and 2017 to minimize longitudinal acoustic variations. It was created by inesc-id and features additional filtering for highly reliable timestamped speech alignments.

AudioAudio DatasetSpeech SynthesisSpeech AlignmentEuropean Portuguese+1

0 views

Speech & Audio

Pitch Imagery Arrow Task: Experimental Data on Music Training and Mental Control

Experimental data from the Pitch Imagery Arrow Task investigating the effects of musical training, vividness, and mental control. The dataset includes R code for reproducing figures from the associated research article and three data files. The data and code were contributed by author Rebecca Gelding.

TabularAudioMusic TrainingMental ImageryCognitive ScienceExperimental Data+1

0 views

Speech & Audio

HEAR-HSet: Hierarchical Evaluation Set for Zero-Shot TTS, 2,702 Audio Pairs

2,702 prompt-target audio pairs for evaluating zero-shot text-to-speech systems across three hierarchical dimensions: basic generalization, paralinguistic control, and hard robustness scenarios. The dataset, created by dinosaaaur, contains approximately 1GB of audio in Chinese and English and was last updated on June 23, 2026. It is structured into three subsets targeting different evaluation goals.

AudioSpeech SynthesisEvaluation BenchmarkMultilingual SpeechBenchmarkZero Shot Tts+1

0 views

Speech & Audio

German Far-Right Activism Events at County Level, 2013–2024

A county-level panel dataset integrates three streams of German far-right activism—music events, protests, and violence—from 2013 to 2024. It harmonizes data from federal parliamentary inquiries and civil-society counseling centers to fixed 2024 administrative boundaries. The dataset provides a balanced county-year panel for 400 counties over 12 years, plus event-level and monthly files.

TabularAudioGeospatial🇩🇪 GermanyPolitical ViolencePanel DataFar Right ActivismCounty-Level+1

0 views

Speech & Audio

Neyshekar V4: 40,008 Persian Speech Clips for ASR and TTS

Neyshekar is an open Persian speech dataset containing 40,008 transcribed audio clips totaling approximately 63 hours. The data was collected from native Persian speakers through a community-driven crowdsourcing platform. This release, V4.1, was created by shekar-ai and last updated on June 15, 2026.

AudioAudio DataPersian LanguageSpeech CorpusCrowdsourcedSpeech Recognition+1

0 views

Speech & Audio

Human-Performed Piano Improvisation in F Sharp Dorian, 1.1 GB

Alexander Paul Burton's dataset documents a live, unedited rubato piano improvisation in F Sharp Dorian. The 1.1 GB release includes multiple file formats such as MP3, CSV, and XLSX, capturing a high-density performance across the C3 to C6 range. It is released under a CC-BY-4.0 license as open-access anti-algorithm training data.

TabularAudioCSVExcelAudio DataNeoclassicalPianoMusic PerformanceImprovisation+1

0 views

Speech & Audio

Memo2496: 2,496 Instrumental Songs with Expert Valence-Arousal Annotations

Memo2496 is a music emotion recognition dataset containing 2,496 instrumental songs annotated by 30 music experts. The dataset provides valence-arousal labels and extracted acoustic features to support affective computing research. It was updated by Qilin Li in April 2026 and is hosted on figshare under a CC-BY-4.0 license.

TabularAudioZIPCSVTextJSONAffective ComputingValence ArousalAudio FeaturesNatural Language ProcessingMusic Emotion RecognitionInstrumental Music+1

0 views

Speech & Audio

Discover Piano: Pre-Tokenized Solo Piano MIDI Dataset for Symbolic Music AI

Ultimate pre-tokenized solo Piano MIDI dataset for symbolic music AI and MIR purposes. The dataset was created by asigalov61 and was last updated on June 23, 2026. It is hosted on the Hugging Face platform.

AudioMusic Information RetrievalMidiSymbolic MusicPiano+1

0 views

Speech & Audio

Whiskered Auklet Ornament Symmetry Data for 721 Birds, 1992-2009

Ian L. Jones provides data and R code for a study on symmetry in facial feather ornaments of the Whiskered Auklet. The dataset covers 721 wild-caught marked individuals, with 162 of known sex and 94 of known age (1-16 years old), studied between 1992 and 2009. It was used to analyze effects of age, sex, body size, and condition on ornament asymmetries, and correlations with ocean climate and population measures.

TabularEcological DataBird SymmetryFluctuating asymmetryAuklet BiologyOrnamental Plumes+1

0 views

Speech & Audio

Permian Tuff and Palynology Ages from the Canning Basin, Western Australia

U–Pb dating of zircons from middle Permian tuffs in the Canning Basin reveals a conflict with established spore-pollen zonation. The dataset includes an age of 267.04 ± 0.14 Ma from the M. villosa Zone, which is 1.7 million years younger than tuffs from the D. granulata Zone. This research was published by Mory et al. in the Australian Journal of Earth Sciences in 2017.

TabularCanning BasinGeochronologyPalynologyStratigraphyLarge ScalePermian+1

0 views

Speech & Audio

Bullying Among Massachusetts Students Survey Data from 2009

2009 Massachusetts Youth Health Survey data analyzed by the Massachusetts Department of Public Health and CDC shows significant associations between bullying involvement and family violence. The report details adjusted odds ratios for middle and high school students categorized as bullies, victims, or bully-victims. Analysis controlled for age group, sex, and race/ethnicity.

TabularBullyingSurvey DataHealthcareRisk FactorsYouth HealthPublic Health+1

0 views

Speech & Audio

Khasi-English Multi-Task Speech and Text Dataset for Low-Resource NLP

A multi-task speech and text dataset for Khasi, an Austroasiatic language spoken by roughly 1.5 million people in Meghalaya, Northeast India. It was built to help bring Khasi into modern speech recognition, text-to-speech, and machine translation systems. The dataset was authored by RonitMehta260704 and last updated on HuggingFace in July 2026.

TextAudioText To SpeechMachine TranslationLarge ScaleNatural Language ProcessingKhasi LanguageLow Resource LanguageSpeech Recognition+1

0 views

PreviousPage 6 of 126Next

Speech & Audio Datasets | DataSalon