Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,962 datasets
Álvarez Pérez, Xosé Afonso coordinated this oral history dataset from the e-cienciaDatos Harvested Dataverse, last updated on 2024-05-05. It contains a biographical narrative from an informant, Jesús López, detailing his life, education, and language use in the San Martín de Trevellu/Trevejo area. The description suggests the data covers topics such as schooling, bilingualism between Spanish and the local 'lagarteiro' language, and cultural practices like music and festivals.
José Tomás Sousa (Olivenza). Folklore musical de Olivenza (I) is a collection of folk music recordings from Olivenza, Spain. The dataset, coordinated by Álvarez Pérez, Xosé Afonso, was last updated on May 5, 2024. It focuses on the 'saias' genre and includes other types like occasional songs, gaios, vira, fados, and corridinhos.
Spk Attribute is a dataset for training the MSA-ASR model, created by nguyenvulebinh. The model performs multilingual speech recognition and speaker embedding extraction to differentiate speakers. The dataset was last updated on 2025-04-10.
Featuring a sample of Japanese speech recordings from 1006 native speakers from eastern, western, and Kyushu regions of Japan. All audio content has been manually transcribed with high accuracy.
A dataset containing Arabic audio samples paired with corresponding text transcriptions that include full diacritization. It was created by Nourhann and last updated on February 20, 2025. The dataset is designed to support research in Arabic speech processing and natural language processing tasks.
Aggregating 100,000 colloquial English sentences recorded by 3,691 Chinese speakers using mobile phones. The recordings cover speakers from multiple domestic dialect zones including Jiangsu, Shandong, Beijing, and Henan, capturing the specific accent of Chinese speakers of English.
Myanmar speech recordings extracted from the larger multilingual OpenSLR dataset. The dataset was uploaded by author 'chuuhtetnaing' and was last updated on March 27, 2025. The original source is the OpenSLR site, which hosts speech and language resources.
NOAA's National Status and Trends Bioeffects Program conducted a stratified probabilistic sampling study in 2004 to define contamination and biological effects in Massachusetts/Cape Cod Bays, Stellwagen Bank, and Boston Harbor. The survey utilized the sediment quality triad approach, measuring sediment and water data to characterize chemical contamination and benthic infaunal community structure. It specifically sampled areas near the new and former Boston sewage outfalls.
NOAA/WDS Paleoclimatology archives a tree ring chronology from the Pearl site at Marconi National Seashore, Massachusetts. The dataset provides parameters for reconstructing past climate conditions, covering the period from 148 to -64 calendar years before present. NOAA National Centers for Environmental Information (NCEI) compiled and published this data, with a last recorded update in 2014.
Comprising a sample of Taiwan Mandarin speech recordings from 204 residents, each reading 450 sentences. The content covers topics like economy, entertainment, news, and spoken language, including general and human-computer interaction scenes. The audio data is accompanied by manual transcriptions.
This placeholder dataset contains a small collection of audio files in .flac format specifically formatted for the Speech processing Universal PERformance Benchmark (SUPERB). It provides a file column to facilitate the development of speech processing pipelines and the extraction of self-supervised learning representations.
Cleaned and denoised audio-text pairs for the Mooré language (ISO 639-3: mos) sourced from public domains. This unified corpus is specifically curated for low-resource speech tasks including text-to-speech (TTS) and automatic speech recognition (ASR).
CML-TTS is a multilingual Text-to-Speech dataset developed at the Center of Excellence in Artificial Intelligence (CEIA) of the Federal University of Goias. It comprises audiobooks sourced from public domain books on Project Gutenberg, read by volunteers from the LibriVox project, and includes recordings in languages such as Dutch, German, French, Italian, and Polish. The dataset was last updated on the Hugging Face platform on 2023-11-24.
Japanese speech audio recordings and transcriptions sampled at 16kHz from various Galgame (visual novel) titles. The dataset is released under the GNU General Public License v3.0 and strictly forbids commercial use of the data or any resulting models.
Amphion released the INTP dataset in late 2024, providing 250,000 synthetic speech preference pairs totaling over 2,000 hours of audio. The collection spans English and Chinese languages across diverse scenarios including regular speech, repeated phrases, and code-switching contexts for speech intelligibility research.
Presenting a sample of a paid corpus containing speech recordings from 38 Hong Kong native speakers, annotated with the involvement of a professional phonetician. It is designed for speech synthesis research and development.
Featuring a sample of Korean speech data recorded by 291 local speakers using mainstream Android phones and iPhones in quiet indoor environments. Each speaker recorded 400 sentences across categories including economics, entertainment, news, oral, figure, and letter.
The Wolof Audio Dataset is a collection of audio recordings and corresponding transcriptions in Wolof, designed to support Automatic Speech Recognition (ASR) model development. It was created by the author galsenai, combining four existing datasets: ALFFA, FLEURS, and the Urban Bus Wolof Speech Dataset. The dataset was last updated on 2024-12-25.
A collection of a sample of speech recordings from 290 children in the U.S.A., with a balanced male-female ratio. The audio content is sourced from children's books and textbooks, recorded in quiet indoor environments using mobile phones.
Myrtle.ai provides background noise audio for augmenting training data for their CAIMAN-ASR models. The dataset's modifications are licensed under CC BY 4.0, while the original source data is under CC BY 3.0 or in the public domain. It was last updated on February 19, 2024.