DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

One Vision One Life: Pittsburgh Violence Prevention Program Assessment

An assessment of Pittsburgh's One Vision One Life violence prevention strategy authored by Jeremy M. Wilson. The report likely contains data on program implementation, operations, and impact, including community-building, conflict intervention, and mediation. It also includes comparisons with other cities and lessons learned.

TabularMedicineComputer SciencePsychologyViolence PreventionComputer VisionGerontologyPublic SafetyCommunity InterventionProgram Assessment+1

0 views

Speech & Audio

Rock Music History in Eastern Europe and the Soviet Union

The dataset likely contains historical analysis of rock music's role in Cold War geopolitics. It appears to be sourced from a research paper discussing containment policy and Western support for Yugoslavia. The specific data volume and structure are unknown.

TextAudioGeorge RobotHistoryEastern BlocCold WarArt HistoryLawNuclear WeaponEmpirePolitical HistoryCommunismSubversionSoviet UnionGovernment LinguisticsRock MusicSoviet HistoryEconomic HistoryPolitical SciencePolitics+1

0 views

Speech & Audio

Rock Around the Bloc: Rock Culture in Eastern Europe and the Soviet Union, 1954-Present

A historical text traces the emergence and conflict of rock music culture in Eastern Europe and the Soviet Union from 1954 to the present. It covers the 30-year conflict between rock fans and the Communist Party, including events in Prague in 1968 and Poland in 1981. The source is a book titled 'Rock Around the Bloc', but the specific dataset format and structure are unknown.

TextAudioHistoryCold WarLawCommunismSoviet UnionAllianceRock MusicEconomic HistoryPolitical SciencePolitics+1

0 views

Speech & Audio

my_asr_dataset_v2: Speech Data for Automatic Speech Recognition

my_asr_dataset_v2 is a dataset for automatic speech recognition, published on Kaggle. The dataset's specific size, collection method, and temporal coverage are not detailed in the available metadata. Its content and structure require verification after download.

AudioSpeech DataAudio ProcessingAutomatic Speech Recognition+1

0 views

Speech & Audio

MoviesTextSASRec: Movie Text Data for Sequential Recommendation

A dataset titled 'MoviesTextSASRec' published on Kaggle. The title suggests it likely contains text data related to movies, potentially for use with sequential recommendation models like SASRec. The dataset's author, organization, size, and specific content are unknown.

TextMovie RecommendationCollaborative FilteringText Data+1

0 views

Speech & Audio

ASRDemo: Speech Recognition Demonstration Data

ASRDemo is a dataset published on Kaggle. Its title suggests it contains audio data for speech recognition demonstration purposes. The dataset's specific size, format, and content details are unknown.

AudioDemoSpeech Recognition+1

0 views

Speech & Audio

SeniorTalk: Mandarin Chinese Speech Data for Seniors Aged 75-85

SeniorTalk provides 10,000 to 100,000 Mandarin Chinese speech records from individuals aged 75 to 85, produced by BAAI in 2025. It includes audio and text modalities to facilitate research in automatic speech recognition and speaker verification for the super-aged population.

ParquetSize Categories10 Kn100 KLibrarypolarsLanguagezhLibrarydaskLicensecc By Nc Sa 40ModalitytextLibrarymlcroissantLibrarydatasetsRegionusTask Categoriesautomatic Speech RecognitionArxiv250316578+1

0 views

Speech & Audio

my_asr_dataset: Audio Data for Speech Recognition

An audio dataset titled 'my_asr_dataset' is hosted on Kaggle. The dataset's content, size, and specific characteristics are not detailed in the provided metadata. Its creator, license, and update history are also unknown.

AudioSpeech RecognitionAutomatic Speech Recognition+1

0 views

Speech & Audio

Urdu Text-to-Speech Corpus Subset

A subset of a corpus for Urdu text-to-speech synthesis, published on Kaggle. The dataset likely contains audio recordings paired with corresponding text transcripts. Specific details on size, collection method, and contributors are not provided in the available metadata.

TextAudioText To SpeechSpeech SynthesisUrdu LanguageNatural Language ProcessingAudio Corpus+1

0 views

Speech & Audio

Urdu Text-to-Speech Corpus Subset Processed

A processed subset of an Urdu text-to-speech corpus, published on Kaggle. The dataset likely contains aligned audio recordings and corresponding text transcripts for speech synthesis tasks. Specific details on size, creation date, and original source are not provided in the available metadata.

TextAudioText To SpeechUrduSpeech CorpusNatural Language ProcessingAudio Processing+1

0 views

Speech & Audio

Music Mel Spectrogram Dataset for Audio Feature Extraction

Mel spectrograms provide a time-frequency representation of audio signals, commonly used for machine learning tasks. This dataset, hosted on Kaggle, likely contains pre-computed mel spectrogram features derived from music audio tracks. The specific source, size, and creation details are not provided in the available metadata.

AudioMachine LearningMusic AnalysisSpectrogramAudio Processing+1

0 views

Speech & Audio

Azerbaijani Asr Zenfira: Speech Recognition Dataset

Azerbaijani Asr Zenfira is a speech dataset hosted on HuggingFace by tahmaz. The dataset card indicates it is intended for automatic speech recognition tasks. Its last update was recorded on February 20, 2026.

AudioAudio DataAzerbaijani LanguageSpeech Recognition+1

0 views

Speech & Audio

Urdu Text-to-Speech Corpus Subset from Kaggle

Urdu TTS Corpus Subset is a dataset hosted on Kaggle, likely containing audio recordings and corresponding text transcripts for speech synthesis. The dataset's author, size, and specific content details are not provided in the metadata. Users must download the dataset to verify its exact composition and suitability for their projects.

TextAudioText To SpeechSpeech SynthesisUrdu LanguageNatural Language ProcessingAudio Corpus+1

0 views

Speech & Audio

LibriSpeech Augmented with MUSAN Noise and Music Samples

A speech audio dataset combining the LibriSpeech corpus with MUSAN augmentation data. The dataset is published on Kaggle, but specific details on size, creation date, and author are not provided in the metadata. Its content likely contains speech recordings augmented with noise and music samples for machine learning training.

AudioMachine LearningAugmentation+1

0 views

Speech & Audio

BACHI: Boundary-Aware Symbolic Chord Recognition Data for Pop and Classical Music

Trained model weights and datasets for the BACHI chord recognition system. The data supports the paper 'BACHI: Boundary-Aware Symbolic Chord Recognition Through Masked Iterative Decoding on Pop and Classical Music' by Mingyang Yao and Ke Chen, accepted for ICASSP 2026. The dataset page was last updated on 2026-01 17.

AudioClassical MusicPop MusicChord Recognition+1

0 views

Speech & Audio

Synthetic and Augmented Dysarthric Speech Corpus

Synthetic-dysarthric-speech is a dataset containing artificially generated and augmented speech samples simulating dysarthria, a motor speech disorder. It is intended for developing robust automatic speech recognition and semantic understanding systems. The dataset's creator, size, and update date are not specified.

AudioSpeech SynthesisMedical AudioSpeech RecognitionDysarthriaSynthetic+1

0 views

Speech & Audio

Spotify Music Tracks Classified by Commercial Popularity

A classification dataset for predicting the commercial success of music tracks on Spotify. The dataset likely contains audio features and metadata to categorize songs into High, Medium, or Low popularity tiers. It was sourced from Kaggle, but details on its creator, size, and specific features are not provided.

0 views

Speech & Audio

Music Recommender System Data from Kaggle

A dataset for building music recommendation systems, sourced from the Kaggle platform. The specific content, scale, and features are not detailed in the available metadata. Further details regarding the data's origin, collection method, and temporal coverage are unknown.

TabularAudioMachine LearningMusic RecommendationCollaborative Filtering+1

0 views

Speech & Audio

ACI-Bench: Clinical Dialogue to Structured Note Conversion Benchmark

ACI-Bench-MedARC evaluates model performance in converting clinical dialogue into structured clinical notes. The dataset includes the benchmark and data from ablation studies testing different transcription methods. It was uploaded by mkieffer to HuggingFace and last updated on 2026-01-18.

TextClinical NlpBenchmark EvaluationSpeech To TextBenchmarkHealthcareMedical Transcription+1

0 views

Speech & Audio

S16K: Multi-Label Music Emotion Recognition Dataset with 16k+ Songs

More than 16,000 songs from NetEase Cloud Music are included in this multi-label music emotion recognition dataset. MFCC features were extracted from the middle 30 seconds of each song using librosa. The dataset was created by joyfuljune and last updated on 2026-01-29.

AudioAudio FeaturesMulti Label ClassificationMusic Emotion RecognitionMfcc+1

0 views

PreviousPage 74 of 130Next