Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,962 datasets
A speech dataset comprised of recordings of two people engaging in spontaneous conversations in English. The dataset aims to fill the gap in high quality spontaneous speech data and was created by CASPER-SSSD, last updated on June 16, 2025. Conversations were conducted over a custom-built web platform from each participant's end and their own device.
Approximately 170 square kilometers of seafloor data were collected for Boston Harbor and its approaches. The National Oceanic and Atmospheric Administration Ship Whiting gathered sidescan sonar and bathymetric measurements in 2000 and 2001. The Massachusetts Office of Coastal Zone Management and the U.S. Geological Survey reprocessed and gridded the data.
Petersham, Massachusetts is the location for these lidar-derived digital surface model (DSM) data, representing surface elevations for 'leaf-on' conditions in August 2022. The data were collected by the NSIDC_CPRD organization as part of the SMAPVEX19-22 campaign to validate satellite-derived soil moisture estimates in forested areas. The DSM captures the highest elevation of features, which may include bare-earth, vegetation, and human-made objects.
AVHRR satellite imagery of Eastern Antarctica was captured by the NOAA12 satellite. Data collection began in June 1996, covering specific coastal and ice shelf regions, but the archival service was discontinued in 2015. The data originates from the Antarctic Meteorology Centre's Casey HRPT receiver, managed by the Australian Antarctic Data Centre (AU_AADC).
Coastal seafloor physiographic zones between Nahant and Gloucester, Massachusetts, are characterized from NOAA nautical charts and aerial photographs. The dataset was created by SCIOPS and last updated in 2003. It focuses on inshore areas not covered by other high-resolution geophysical surveys.
Approximately 170 square kilometers of seafloor data were collected by NOAA Ship Whiting in 2000 and 2001. The Massachusetts Office of Coastal Zone Management and the U.S. Geological Survey reprocessed and gridded the sidescan sonar and bathymetric measurements. These data were converted to the Massachusetts State Plane coordinate system in 2006.
2003 data from NASA EarthData provides geospatial statistics on internal wave packets extracted from Synthetic Aperture Radar (SAR) imagery over Massachusetts Bay. The dataset, sourced from NOAA NCEI, contains polygons representing 1x1 minute latitude/longitude grid cells with calculated statistical metrics for each cell. It was created to analyze the frequency, size, and location of these oceanographic features.
2003 data from NOAA NCEI provides statistics on internal wave packets extracted from Synthetic Aperture Radar (SAR) imagery. The data is aggregated into 30x30 arc-second latitude/longitude polygon grid cells. It includes calculated metrics for each cell, such as packet frequency and area statistics.
Reprocessed SEVIRI All-Sky Radiances product contains mean brightness temperatures from all thermal infrared and water vapor channels for 16x16 pixel areas. The product, generated by EUMETSAT using version 1.5.3 software and ECMWF ERA-interim data, includes clear and cloudy sky brightness temperatures, clear sky fraction, and solar zenith angle. Data is BUFR encoded and provided at 3-hourly intervals on every third quarter hour.
MDCC is a large-scale Cantonese automatic speech recognition dataset compiled from multiple domains. It provides .wav recordings of both spontaneous and read speech paired with UTF‑8 plain‑text transcripts and speaker metadata. The dataset was created by author 'ming030890' and was last updated on the Hugging Face platform on 2025-07-26.
Audio segments and transcriptions extracted from the NPTEL Introduction to World Literature lecture series. The dataset is intended for research and educational purposes in speech recognition and literary content analysis. It was uploaded by author swastik17 to Hugging Face and last updated on 2025-05-20.
Presenting a sample of a paid corpus containing speech recordings from 10 British English native speakers. It is designed for speech synthesis research, featuring balanced phoneme coverage and annotations involving a professional phonetician.
Tejasva-Maurya's English Technical Speech Dataset contains 11,247 audio recordings of technical vocabulary. The collection includes transcriptions and speaker embeddings, last updated on October 26, 2024. It is designed for developing speech and language models.
Nawar Halabi at the University of Southampton developed this speech corpus as part of PhD work. Recordings were made in a professional studio using the south Levantine Arabic dialect with a Damascian accent. Synthesized speech output from this corpus has reportedly produced a high-quality, natural voice.
A curated subset of the MTG-Jamendo Autotagging benchmark containing tracks annotated with genre, instrument, and mood/theme tags. Audio files are preprocessed to 30-second clips at a 16kHz sampling rate for consistent music auto-tagging tasks. The dataset was uploaded by author vtsouval and last updated on 2025-05-14.
MusicSem is a multimodal dataset containing 35,977 entries of paired text and audio. It includes a withheld test set of 480 entries for leaderboard evaluation. The dataset was curated by Rebecca Salganik, Teng Tu, Fei-Yueh Chen, Xiaohao Liu, Kaifeng Lu, Ethan Luvisia, Zhiyao Duan, Guillaume Salha-Galvan, Anson Kahng, Yunshan Ma, and Jian Kang.
A Ukrainian municipal cultural institution's list of concluded contracts, other transactions, annexes, and additional agreements for the first quarter of 2021. The dataset was published on the State site of Ukraine and last updated on June 17, 2021. It likely contains details of financial agreements and procurement activities for the Dnipro Children's Music School No. 19.
FLORAS is a 50-language benchmark for long-form recognition and summarization of spoken language. It was created by espnet and last updated on November 29, 2024. The dataset aims to provide a realistic test environment for models by using raw, long-form conversational audio with one or many speakers.
Featuring crying sound recordings from 201 infants and young children aged 0 to 3 years, with multiple audio segments per child. It is a sample of a larger paid dataset intended to support the detection of children's crying sounds in smart home projects. The dataset was created by Nexdata and last updated in April 2025.
Approved by Prefectural Order No. 75-2019-10-03-003 on 3 October 2019, this dataset contains the sound classification of RATP railway infrastructure in the Paris department. It segments overhead railway lines into homogeneous sections and assigns them a noise category from 1 to 5, where a higher number indicates a lower assumed noise level at the infrastructure edge. The data is provided by the Bureau de Recherches Géologiques et Minières (BRGM) and was last updated on 7 October 2019.