Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,909 datasets
NOAA's Northeast Fisheries Science Center collected standardized ichthyoplankton survey data from 1977 to 1988 along the continental shelf between Cape Hatteras, NC and Cape Sable, NS. A subset of 6,406 bongo samples from this broader collection of 25,000 samples was used to model abundance and distribution within the Gulf of Maine. The dataset supports studies on fish community structure changes and recruitment mechanisms.
KSC2 Structured is an enhanced version of the Kazakh Speech Corpus 2, providing audio recordings paired with transcripts that have restored punctuation and capitalization. Developed by Inflexion Lab, this dataset addresses the limitation of the original KSC2's plain lowercase transcripts. The dataset page was last updated in March 2026.
A public sample of a Brazilian Portuguese medical audio dataset built for ASR, TTS, and conversational AI evaluation. This repository contains 1 record, 20 aligned audio segments, 1 speaker, and about 5.26 minutes of audio, derived from deidentified clinical source material. The full dataset and commercial licensing are available from juliasdata.com.
1059 traditional music tracks from 33 countries or areas, with geographical origin determined by the artist's main residence. Audio features were extracted from wave files using the MARSYAS program, resulting in 116 feature columns plus latitude and longitude targets. The dataset is licensed under CC-BY-4.0.
A dataset hosted by YomnaGharib on Hugging Face, last updated on 2026-05-11. The title suggests it contains audio data processed using the Demucs source separation tool, likely for text-to-speech (TTS) applications. The specific content, scale, and original source require verification after download.
Phonetically balanced sentences from reference texts were recorded in a studio environment. The dataset contains orthographic transcriptions and phonemically aligned transcriptions in TextGrid format, paired with 16 KHz, 16-bit WAV audio files. This resource is designed for speech synthesis and natural language processing research.
Lwazi Afrikaans ASR corpus provides matched audio recordings and orthographic transcriptions designed for speech recognition systems. Audio files are telephone-quality, recorded at 8 KHz, 16-bit, and single-channel, with each utterance stored in a separate text file. This dataset was created to support the development of Automatic Speech Recognition (ASR) for the Afrikaans language.
British musical theatre productions from the 2010s are documented in this dataset collated by Sarah K. Whitfield and Clare Chandler. It covers a ten-year period from 2010 to 2019. The dataset is hosted on figshare and was last updated in April 2026.
A Khmer language dataset likely containing speech or audio data for cultural applications. It is published on HuggingFace by author rinabuoy and was last updated on 2026-05-01 09:31:45.
Audio recordings collected in community settings in Senegal cover topics including family planning, healthcare access, pregnancy practices, and cultural beliefs around maternal and reproductive health. The dataset was created by YUXCulturalAILab and last updated on March 19,我们发现了一个问题。 2026. Recordings were captured using mobile devices or portable recorders in natural conversational conditions, and all transcriptions were manually verified.
A bilingual text-to-speech dataset containing Hebrew and English audio generated by male and female speakers. Audio files have been resampled to 44.1kHz and time-stretched to a slower speed. The dataset was created by author notmax123 and last updated on March 30, 2026.
Asru Data is a dataset uploaded to HuggingFace by author closerG. The dataset was last updated on 2026-05-14. Its specific content and scale are not detailed in the provided metadata.
Regulatory text covers structural integrity, fire safety, and energy conservation for all new construction, renovation, and demolition projects in Massachusetts. The code is written by the State Board of Regulations and Standards and administered locally by certified building inspectors. The dataset originates from the SCIOPS organization via the NASA Earthdata platform.
Irodori TTS Voice Clones is a collection of 2.99 million voice clones for text-to-speech synthesis. It was created by SynDataLab and references the SynDataLab/irodori-refs-10k dataset for source audio. The dataset was last updated on April 23, 2026.
Fall 2003 documentation details the Massachusetts air quality program's implementation of federal and state Clean Air Acts. The dataset includes regulatory procedures, application forms, fee structures, and review timelines for construction permits. It was published by SCIOPS in 2003.
Environmental Protection Agency's BEACH Program data focuses on improving public health for beachgoers through five key areas, including pollution prediction and faster water testing. The program, sponsored by the EPA and managed by SCIOPS, provides information on coastal water quality. Specific contact information is available for data related to Massachusetts beaches.
Google Search Console normalized data from the Tably.es marketplace for May 2026. The dataset likely contains aggregated search performance metrics for the platform. The author, organization, and specific data volume are unknown.
DBp is a multimodal dataset from the Music-in-Medicine program, recording a Dueling Brains performance. The data includes audio, video, and tabular file formats, totaling approximately 9.8 GB in size. It is openly licensed under CC-BY-4.0 and authored by Maxine Annel Pacheco-Ramírez.
MHp provides a 5.8 GB multimodal dataset capturing a live 'Musical Healing' performance from the Music-in-Medicine program. It likely contains synchronized electroencephalogram (EEG) brain activity recordings and audio data, such as piano music. This dataset supports research into the neurological and physiological effects of therapeutic music interventions.
Block-scale rooftop solar technical potential estimates for the city of Orlando, Florida, derived from LiDAR and national parcel data. It includes developable roof area and technical potential in kilowatts, along with the most common building use and occupancy type per block.