Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,907 datasets
A curated collection of 1000 Egyptian Arabic speech samples, each representing the best audio selected from four different text-to-speech models. The dataset was created by MohamedGomaa30 and includes transcription text and quality metadata for each entry. It was last updated on May 14, 2026.
A benchmark dataset for detecting AI-generated symbolic music, focusing on the MIDI format. The dataset was created by dhlee3000 and last updated on May 15, 2026. It addresses concerns about authenticity in digital music by providing a resource for a domain previously less explored than audio deepfake detection.
A manuscript from the University of Pittsburgh Medical Center describes a surgical approach for head and neck cancer patients during the COVID-19 pandemic. Authored by Mark Kubik, it details methods for providing timely reconstructive care while minimizing infectious risk to providers, patients, and families.
815,171 audio clips totaling over 2,264 hours of speech, compiled by agarwalayushi and last updated in April 2026. This dataset covers Hindi, Hinglish (Hindi-English code-switching), and Indian English, sourced from 14 public corpora and custom recordings, unified into a single Parquet file.
A sampled subset of Amazon Reviews from the Musical Instruments category, filtered for a recommendation system project. The data covers reviews from January 2018 to September 2023 and was processed with iterative 5-core filtering to ensure users and items have at least five interactions. It was created by oyku-tugana and includes a held-out test set of 5000 users for cold-start evaluation.
LibriHeavy TTS 3 is an improved version of the LibriHeavy dataset, designed specifically for text-to-speech training quality. It is built on a 50,000-hour labeled ASR corpus derived from LibriLight, with audio encoded using the Opus 68kbps codec. The dataset, authored by brthor and last updated in April 2026, focuses on providing better audio and text supervision quality.
Armenian speech audio and caption files collected from the Azatutyun YouTube channel. The dataset includes a 'train' split with clean audio and captions and a 'bad_subtitles' split with known noisy captions. It was created by Arthuryann and last updated on May 5, 2026.
Oceanographic data from a NOAA expedition mapping the Musicians Seamounts chain up to 650 nautical miles north of Hawaii. The dataset includes shipboard sensor measurements for navigation, meteorology, and oceanography collected from August 8 to August 31, 2017. It is produced by the National Oceanic and Atmospheric Administration and is available on multiple government data platforms.
PianoCoRe is a large-scale piano MIDI dataset that unifies and refines major open-source piano corpora. It contains 250,046 performances of 5,625 pieces written by 483 composers, totaling 21,763 hours of performed music. The dataset was created by SyMuPe and was last updated on 2026-04-27.
A Persian speech dataset containing audio files resampled to 16000 Hz. The collection includes 134,994 samples totaling 97 hours and 20 minutes of audio, split into training and test sets. It was uploaded by user 'veziriii' to Hugging Face and last updated on 2026-05-25.
Pre-extracted audio codec tokens for TTS training, containing 6,082 samples totaling 15.6 hours of audio. The dataset was created by author somu9 and was last updated on 2026-05-18. It uses the MOSS-Audio-Tokenizer-Nano codec at a sample rate of 48,000 Hz and a frame rate of 12.5 Hz.
MEX assets include metadata and precomputed baseline MID artifacts derived from standard Music Information Retrieval datasets. The dataset is a derivative of public sources like SALAMI and session, with licenses including CC0-1.0 and MIT. It was last updated on 2026-05-21 by author muthissar.
TeraTTS provides a dataset of 9,394 high-quality audio clips paired with transcript text, extracted from the video game Slay the Princess. The collection totals approximately 13 hours of audio across three primary speakers. The dataset was last updated on Hugging Face in May 2026.
A curated evaluation set for Indic-language automatic speech recognition. It contains 6,169 audio samples across 7 dataset configurations, totaling approximately 13.3 hours of audio at 16 kHz. The dataset was created by ayush-shunyalabs and last updated on 2026-04-23.
C3 provides monthly operational funding to child care programs across Massachusetts. Each row represents the amount of C3 funds disbursed to a program by fiscal year, which runs from July 1 to June 30. The data is published by educationtocareer.data.mass.gov and was last updated on April 13, 2026.
Dataset contains 19,000 open-access research papers related to COVID-19 collected from various sources between 2020 and 2021. Includes metadata such as titles, authors, abstracts, publication dates, and source repositories.
Data and code for a systematic review of exploratory factor analysis practices in music psychology and music education. The dataset includes an Excel file with a codebook for each variable and an R Markdown file. It was authored by Daniel Yeom and last updated on April 17, 2026.
2,967,779 clone utterances across 2,971 English speakers, generated by the echo-tts synthesizer. The dataset was created by SynDataLab and last updated on 2026-04 25. It contains WAV audio at 44.1 kHz, stored in Parquet files, with each speaker represented by 10 voice-clone latents and 100 synthesized texts.
A Japanese-to-Simplified Chinese pre-translation dataset extracted from the COM3D2 and CM3D2 video game series. The dataset includes text from the base games, their expansions, and nearly all DLCs up to April 4, 2026. It was created by author mollyadams, with translations primarily generated by GPT-5.2 xhigh and refined by GPT-5.4 xhigh, with a last recorded update on April 24, 2026.
Advanced Placement exam score data for Massachusetts public and charter schools from 2007 onward. The dataset includes counts of students receiving each score (1-5) and percentages scoring in low (1-2) and high (3-5) ranges, disaggregated by student demographic groups. Data is published by the Massachusetts Department of Elementary and Secondary Education (DESE).