Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,924 datasets
Hindi speech dataset of Narendra Modi for TTS and voice cloning. The dataset is hosted on Kaggle and is tagged for speech data, audio, and Hindi language processing. Its specific size, format, and creation details are not provided in the metadata.
A benchmark of 300 title pairs for validating prerequisite knowledge links, with each pair receiving two independent expert ratings. It accompanies a research paper on crowdsourcing prerequisite knowledge graphs at scale.
Audrey M. Skaife authored a dataset on musical taste, likely containing data related to psychological and aesthetic factors. The dataset is published on paperswithcode, a platform for academic datasets. Columns suggest it may include measures of musical deviation, complexity, and associated taste ratings.
Massachusetts political history data likely contains information related to the Federalist party and the Hartford Convention. The dataset is authored by James M. Banner and published on paperswithcode. Its specific content and scope must be verified after download.
Bronia Kornhauser authored a dataset for classifying musical instruments, sourced from paperswithcode. The dataset likely contains audio samples or features of various instruments for classification tasks. Metadata is minimal; the specific content, size, and structure require verification after download.
Daniel J. Buysee authored the Pittsburgh Sleep Quality Index, a clinical assessment tool. The dataset likely contains survey responses or scores related to sleep quality and insomnia. It is published on the paperswithcode platform.
A historical dictionary authored by Carstairs Douglas, focusing on the vernacular or spoken language of Amoy. The work includes principal variations of the Chang-chew and Chin-chew dialects. It is published on the paperswithcode platform.
A dataset from paperswithcode authored by Barbara Tillmann. It likely contains data related to brain activity, specifically in the inferior frontal cortex, during musical priming experiments. The dataset's size, temporal coverage, and specific variables are unknown from the provided metadata.
A Kaggle dataset titled 'TTSdemo'. The dataset likely contains audio files demonstrating text-to-speech synthesis. The author, organization, and specific content details are unknown.
Aggregating six indices measuring the economic impact of 109 Music Zones in the United States. The indices assess venue concentration, tourism proximity, business counts, non-chain business presence, total annual economic output, and supported employment.
A dataset titled 'Tts Emotional' published on the Hugging Face platform by SeifElden2342532. The dataset was last updated on March 3, 2026. Its title suggests it likely contains audio data for text-to-speech synthesis with emotional attributes.
A sample of restaurant market data from BeamStation, focusing on technology-ready establishments within Massachusetts, United States. The dataset is a free sample, but the total number of rows, columns, and specific collection date are not provided. The original author and organization are unknown.
A large-scale dataset for deepfake speech detection, created by the CodecFake organization and released in 2025. It includes the CoRS and CoSG subsets, providing audio samples and corresponding protocol and label files for research in synthetic audio generation and detection.
India-focused conversational speech data for evaluating Automatic Speech Recognition systems on Hindi-English code-mixed utterances. The dataset was curated by soketlabs and last updated on the Hugging Face platform in January 2026. It focuses on natural bilingual contexts where Hindi in Devanagari script and English in Latin script co-occur within the same utterance.
A speaker similarity dataset created by thezholdoshbekov, hosted on Hugging Face and last updated in March 2026. The dataset is structured in a tabular format and includes text modality, as indicated by platform tags. It is designed for tasks involving the comparison and identification of speaker voices.
Over 100,000 audio samples for text-to-speech applications, hosted on Hugging Face by datadriven-company. The dataset includes text and corresponding high-fidelity speech audio. It was last updated in March 2026.
Bahraini Speech Dataset is a Bahraini Arabic speech corpus built from publicly available podcast and video content. It contains 90,421 single-speaker utterance clips with aligned transcriptions, created by Hishambarakat and last updated on January 23, 2026.
SPGISpeech is a monolingual English dataset for automatic speech recognition tasks. The dataset is categorized as containing between 1 million and 10 million data instances. It was created by the author 'kensho' and was last updated in January 2026.
An audio dataset titled 'universe-merged-withzero-noASR' is hosted on Kaggle. The dataset's specific content, scale, and creation details are unknown from the provided metadata. Its title suggests it may involve merged audio data, possibly excluding automatic speech recognition (ASR) components.
ASR Full Bundle likely contains audio data for training automatic speech recognition systems. The dataset is hosted on Kaggle, but its specific contents, size, and origin are unknown. Users must download the dataset to verify its actual scope and quality.