DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

Tttsophia: Text-to-Speech Audio Samples

Tttsophia is a text-to-speech audio dataset published on Kaggle. The dataset's specific content, size, and creation details are not provided in the available metadata. Further verification after download is required to confirm its exact composition and potential applications.

AudioText To SpeechSynthesis+1

0 views

Speech & Audio

YouTube and Video Data for Saint Kitts and Nevis

A Techsalerator dataset containing YouTube and video data for the Caribbean nation of Saint Kitts and Nevis. The dataset's specific content, volume, and collection methodology are not detailed in the available metadata. The original source and last update date are also unknown.

TabularYoutubeSaint Kitts And NevisSocial MediaVideo Data+1

0 views

Speech & Audio

Turkish Text-to-Speech Audio and Text Data

Turkish TTS Data is a collection for speech synthesis and automatic speech recognition tasks, created by Anilosan15. It contains audio and corresponding text data in the Turkish language. The dataset was last updated in March 2026.

AudioMultimodalOPTIMIZED-PARQUETParquetSize Categories10 Kn100 KText To SpeechTask Categoriestext To SpeechLibrarypolarsAudio DatasetLibrarydaskModalityaudioSpeech SynthesisModalitytextTurkish LanguageLibrarymlcroissantLibrarydatasetsLanguagetrRegionusTask Categoriesautomatic Speech Recognition+1

0 views

Speech & Audio

ASR-Model-Offline: Data for Offline Automatic Speech Recognition Models

ASR-Model-Offline is a dataset published on Kaggle. The title suggests it contains data for training or evaluating offline automatic speech recognition models. The dataset's specific content, size, and origin require verification after download.

AudioMachine LearningSpeech ProcessingAutomatic Speech Recognition+1

0 views

Speech & Audio

British Music Festival Impact Report by George McKay

George McKay authored a report titled 'From Glyndebourne to Glastonbury: The impact of British music festivals'. The report is based on a review of academic and grey literature and identifies eight areas of economic, social, and cultural impact. The dataset appears to be the textual content of this report or its associated data.

TextAudioHistoryArt HistoryArts ResearchMusic FestivalsUnited KingdomGeographyArtCultural ImpactVisual Arts+1

0 views

Speech & Audio

Charles Coüasnon Thesis on the Church of the Holy Sepulchre

A thesis authored by Charles Coüasnon on the Church of the Holy Sepulchre in Jerusalem. The work was submitted to the Massachusetts Institute of Technology Department of Architecture in 1959. The content likely contains architectural and historical analysis.

TextHistoryJerusalemComputer ScienceArchitectureLawAncient HistoryReligious StudiesScope Computer ScienceClassicsExpression Computer SciencePolitical ScienceDignityEmperor+1

0 views

Speech & Audio

Spoken Magpie Ja: Japanese Speech Synthesis Dataset for Instruction Tuning

Spoken Magpie Ja is a Japanese speech synthesis dataset created for instruction tuning of language models. It was generated using the CosyVoice2 TTS system on the llm-jp/magpie-sft-v1.0 text corpus to produce a commercially usable dataset. The dataset was last updated on 2026-01-12.

TextAudioText To SpeechSpeech SynthesisInstruction TuningJapanese Language+1

0 views

Speech & Audio

Common Voice: Metadata and Versioning for Open Speech Data

Common Voice metadata and versioning details provided by the common-voice organization, last updated in March 2026. This repository tracks the evolution and release history of the global open-source speech corpus. It serves as the administrative layer for managing dataset releases across multiple languages.

VoiceOpen DataOpen DatasetsSpeech Recognition+1

0 views

Speech & Audio

Last.fm Music Tracks with Genre and Mood Tags

320,000 songs were scraped from the Last.fm API. The data includes genre tags, mood labels, and popularity information. The author, organization, and specific update date are not provided.

TabularAudioGenre TagsMood LabelsPopularityMusic Tracks+1

0 views

Speech & Audio

Hindi Speech Recordings of Narendra Modi for Voice Synthesis

Hindi speech dataset of Narendra Modi for TTS and voice cloning. The dataset is hosted on Kaggle and is tagged for speech data, audio, and Hindi language processing. Its specific size, format, and creation details are not provided in the metadata.

AudioHindiSpeech DataVoice RecordingAudio Processing+1

0 views

Speech & Audio

Prerequisite Link Validity Benchmark with Expert Ratings

A benchmark of 300 title pairs for validating prerequisite knowledge links, with each pair receiving two independent expert ratings. It accompanies a research paper on crowdsourcing prerequisite knowledge graphs at scale.

Social SciencesComputer and Information Science+1

0 views

Speech & Audio

Role of Deviation and Complexity in Changing Musical Taste

Audrey M. Skaife authored a dataset on musical taste, likely containing data related to psychological and aesthetic factors. The dataset is published on paperswithcode, a platform for academic datasets. Columns suggest it may include measures of musical deviation, complexity, and associated taste ratings.

TabularAudioLiteraturePsychologyAestheticsTasteCommunicationNeuroscienceArtMusical+1

0 views

Speech & Audio

Federalist Party Politics in Massachusetts, 1789-1815

Massachusetts political history data likely contains information related to the Federalist party and the Hartford Convention. The dataset is authored by James M. Banner and published on paperswithcode. Its specific content and scope must be verified after download.

TextHistoryMassachusettsFederalistsLawParty PoliticsConventionPolitical SciencePolitics+1

0 views

Speech & Audio

Classification of Musical Instruments Dataset

Bronia Kornhauser authored a dataset for classifying musical instruments, sourced from paperswithcode. The dataset likely contains audio samples or features of various instruments for classification tasks. Metadata is minimal; the specific content, size, and structure require verification after download.

AudioInstrument ClassificationComputer SciencePsychologyArtMusicalVisual Arts+1

0 views

Speech & Audio

Pittsburgh Sleep Quality Index Survey Responses

Daniel J. Buysee authored the Pittsburgh Sleep Quality Index, a clinical assessment tool. The dataset likely contains survey responses or scores related to sleep quality and insomnia. It is published on the paperswithcode platform.

TabularInsomniaHealth SurveySleep QualityIndex TypographyComputer SciencePsychologySleep System CallWorld Wide WebPittsburgh Sleep Quality IndexClinical AssessmentPsychiatry+1

0 views

Speech & Audio

Chinese-English Dictionary of Amoy Vernacular and Dialect Variations

A historical dictionary authored by Carstairs Douglas, focusing on the vernacular or spoken language of Amoy. The work includes principal variations of the Chang-chew and Chin-chew dialects. It is published on the paperswithcode platform.

TextTranslationHistoryComputer ScienceHistorical LanguageBiologyDictionaryPhilosophyVernacularChinese DialectsChinPrincipal Computer SecurityLinguisticsPaleontologySpoken Language+1

0 views

Speech & Audio

Inferior Frontal Cortex Activation in Musical Priming Experiments

A dataset from paperswithcode authored by Barbara Tillmann. It likely contains data related to brain activity, specifically in the inferior frontal cortex, during musical priming experiments. The dataset's size, temporal coverage, and specific variables are unknown from the provided metadata.

TabularPrimingPsychologyBiologyMusic PerceptionFrontal cortexCognitive psychologyCommunicationPriming AgricultureNeuroscienceArtBrain ImagingMusicalVisual Arts+1

0 views

Speech & Audio

Indonesian Text-to-Speech Audio Dataset with 16.38 Hours of Speech

16.38 hours of high-quality audio for Indonesian Text-to-Speech (TTS) applications. The dataset contains 4,531 audio segments with an average duration of 13.01 seconds each. It was created by Muhammad Arief from Universitas Muhammadiyah Sorong.

AudioText To SpeechAudio DatasetSpeech SynthesisBahasa Indonesia+1

0 views

Speech & Audio

Common Voice 24.0 Mongolian: Cleaned Audio for Text-to-Speech Training

Common Voice 24.0 Mongolian - Cleaned Dataset is a high-quality, cleaned audio collection derived from the Mozilla Common Voice version 24.0 project. The dataset, created by author 'btsee', has been processed through a quality filtering and preprocessing pipeline optimized for Text-to-Speech training. It was last updated on the Hugging Face platform in January 2026.

AudioText To SpeechSpeech RecognitionMongolian LanguageAudio Cleaning+1

0 views

Speech & Audio

TTSdemo: Text-to-Speech Audio Samples

A Kaggle dataset titled 'TTSdemo'. The dataset likely contains audio files demonstrating text-to-speech synthesis. The author, organization, and specific content details are unknown.

AudioText To SpeechSpeech SynthesisAudio Demo+1

0 views

PreviousPage 72 of 130Next