Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,925 datasets
ASR-50hour_chunk of lipighor is a dataset for automatic speech recognition (ASR) tasks, published on Kaggle. The title suggests it contains approximately 50 hours of audio data, likely segmented into chunks. The dataset's specific source, collection method, and detailed contents require verification after download.
Encompassing audio recordings of sung poetry from the Pamir Mountains in Tajikistan's Gorno Badakhshan Autonomous Region, collected during fieldwork in 1998. The recordings were made by Jan van Belle and are part of a larger collection spanning multiple years.
Survey responses measuring sleep quality using the Pittsburgh Sleep Quality Index (PSQI). The data is sourced from Kaggle and likely contains self-reported assessments from healthy individuals. Specific details on the number of records, collection period, and original authors are not provided in the metadata.
Muse contains 116,000 synthetic music tracks in Chinese and English, synthesized using SunoV5 and paired with automatically generated lyrics and style descriptions. Created by bolshyC and introduced in early 2026, the collection supports research into reproducible long-form song generation. The data is divided into Chinese (CN) and English (EN) subsets to facilitate multilingual audio modeling.
A release from the GAMETES repository for generating epistasis models. The specific configuration is a 2-way epistasis model with 20 attributes and a heritability of 0.4. Details on row count, columns, and sample data are unavailable.
Serving as from the GAMETES repository, which generates simulated genetic data for studying epistasis. The specific file name suggests it models a 3-way epistatic interaction with 20 attributes and a heritability of 0.2. No row count, column details, or sample data are available.
A GAMETES dataset for epistasis detection, focusing on 2-way interactions with 20 attributes and a heritability of 0.1. The dataset is generated using the EDM-1 model. Specific details on row count, columns, and sample data are unavailable.
A benchmark dataset for Kannada speech recognition tasks, created by thezholdoshbekov. The dataset was last updated in March 2026 and is hosted on the Hugging Face platform with a size category of 1K to 10K entries. It is associated with libraries for tabular and text data processing.
An assessment of Pittsburgh's One Vision One Life violence prevention strategy authored by Jeremy M. Wilson. The report likely contains data on program implementation, operations, and impact, including community-building, conflict intervention, and mediation. It also includes comparisons with other cities and lessons learned.
A historical text traces the emergence and conflict of rock music culture in Eastern Europe and the Soviet Union from 1954 to the present. It covers the 30-year conflict between rock fans and the Communist Party, including events in Prague in 1968 and Poland in 1981. The source is a book titled 'Rock Around the Bloc', but the specific dataset format and structure are unknown.
my_asr_dataset_v2 is a dataset for automatic speech recognition, published on Kaggle. The dataset's specific size, collection method, and temporal coverage are not detailed in the available metadata. Its content and structure require verification after download.
The dataset likely contains historical analysis of rock music's role in Cold War geopolitics. It appears to be sourced from a research paper discussing containment policy and Western support for Yugoslavia. The specific data volume and structure are unknown.
A dataset titled 'MoviesTextSASRec' published on Kaggle. The title suggests it likely contains text data related to movies, potentially for use with sequential recommendation models like SASRec. The dataset's author, organization, size, and specific content are unknown.
ASRDemo is a dataset published on Kaggle. Its title suggests it contains audio data for speech recognition demonstration purposes. The dataset's specific size, format, and content details are unknown.
SeniorTalk provides 10,000 to 100,000 Mandarin Chinese speech records from individuals aged 75 to 85, produced by BAAI in 2025. It includes audio and text modalities to facilitate research in automatic speech recognition and speaker verification for the super-aged population.
An audio dataset titled 'my_asr_dataset' is hosted on Kaggle. The dataset's content, size, and specific characteristics are not detailed in the provided metadata. Its creator, license, and update history are also unknown.
A processed subset of an Urdu text-to-speech corpus, published on Kaggle. The dataset likely contains aligned audio recordings and corresponding text transcripts for speech synthesis tasks. Specific details on size, creation date, and original source are not provided in the available metadata.
Mel spectrograms provide a time-frequency representation of audio signals, commonly used for machine learning tasks. This dataset, hosted on Kaggle, likely contains pre-computed mel spectrogram features derived from music audio tracks. The specific source, size, and creation details are not provided in the available metadata.
A subset of a corpus for Urdu text-to-speech synthesis, published on Kaggle. The dataset likely contains audio recordings paired with corresponding text transcripts. Specific details on size, collection method, and contributors are not provided in the available metadata.
Azerbaijani Asr Zenfira is a speech dataset hosted on HuggingFace by tahmaz. The dataset card indicates it is intended for automatic speech recognition tasks. Its last update was recorded on February 20, 2026.