Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,971 datasets
The Sharif Emotional Speech Dataset contains 3000 semi-natural utterances, equivalent to 3 hours and 25 minutes of speech data. It covers five basic emotions and a neutral state, labeled by 12 annotators from speech samples of 87 native-Persian speakers extracted from online radio plays.
The Mill Pond elver ladder is one of only a few in Massachusetts with a trap to enumerate migrating young-of-the-year American eels (Anguilla rostrata). This dataset, collected by NOAA NCEI, records the total number of eels and water temperature from April 1 to October 15, 2015, as the start of a long-term proxy for eel status in small coastal watersheds.
Featuring approximately 227.7 hours of high-quality Malay speech audio synthesized by the ms-MY-OsmanNeural voice. The audio is sourced from two text corpora: Malay Wikipedia and News articles (94.5 hours) and transcripts from the Malaysian Parliament (133.2 hours). All audio has a 24,000 Hz sample rate and uses sentences ranging from 2 to 20 words.
3,674 denoised audio files from the Reazon Speech v2 dataset, processed using UVR to remove background music and noise. The dataset was cleaned by author Stardust-minus using eight A800 GPUs over approximately 10 days and was mirrored to Hugging Face by litagin in April 2024.
LP-MusicCaps, Music Negation/Temporal Ordering, and WavCaps datasets were re-organized into instruction form by seungheondoh. The dataset was last updated on August 16, 2023. It likely contains pseudo-captions for music and audio content generated using ChatGPT.
Subsets of the SpokesMix, SpokesBiz, and Diabiz corpora processed into the BIGOS (Benchmark Intended Grouping of Open Speech) format. The data was contributed by the organization 'pelcra' and was last updated on October 26, 2024. The corpora provide spontaneous, conversational speech and phone-based customer interactions in Polish.
A collection of approximately 241 hours of high-quality Malay speech audio synthesized by the ms-MY-YasminNeural voice. The audio is split into two subsets: 99.4 hours from Malay Wikipedia and News texts, and 142 hours from Malaysian Parliament transcripts. All audio has a 24000 Hz sample rate and uses sentences between 2 and 20 words in length.
An interview with Francisco and Lola in Rubiás, focusing on language similarities and differences across the border. The dataset likely contains discussions on the assessment of Galician spoken on television, dialectal variations in border villages like Montalegre, and comparisons between Galician and Portuguese. It was coordinated by Álvarez Pérez, Xosé Afonso and last updated on May 5, 2024.
Descriptive text data on folk music and dance traditions from the Olivenza region, likely documenting cultural practices. The dataset was coordinated by Álvarez Pérez, Xosé Afonso and harvested into the e-cienciaDatos Dataverse platform. It was last updated on May 5, 2024.
An interview with Sara Delgado from Piedras Albas, harvested by e-cienciaDatos. The audio recording captures personal recollections about childhood in the town, local livelihoods, life on the border, and cultural topics like music festivals and contraband. The dataset was last updated on May 5, 2024.
A Ukrainian municipal report details the equity of the Dnipro Children's Music School No. 12 for the year 2020. The data was published by the States site of Ukraine and last updated on April 30, 2021. The report is available in XLSX format, suggesting a tabular structure likely containing financial or asset information.
A financial report for the City Municipal Cultural Institution "Dnipro Children's Music School Ü 12" covering the year 2020. The data originates from a public institution in Ukraine and was published on an open data platform in April 2021. The report likely contains details on the institution's income, expenditures, and financial performance.
A financial report details the consumption of communal resources for the City Municipal Institution of Culture 'Dnipro Children's Music School Ü14'. The data covers the period from January to August 2021 and was published on the States site of Ukraine. The dataset was last updated on September 9, 2021.
Financial report of the City Municipal Institution of Culture 'Dnipro Children's Music School No. 14' on communal expenditures for the first half of 2021. The dataset was published on the States site of Ukraine open data platform on 2021-07-04. It likely contains detailed records of utility payments for a municipal cultural institution.
OpenSLR 32 provides high-quality, multi-speaker transcribed audio data for Sesotho, one of four South African languages. The dataset consists of wave files and corresponding transcriptions in a TSV file. It was uploaded to Hugging Face by voice-biomarkers in August 2024.
4,000 Romanian sentences recorded across 8 sessions by a single speaker in a hemianechoic chamber. Audio was captured at 96 kHz/24-bit and downsampled to 48 kHz using a Sennheiser MKH 800 small diaphragm condenser microphone.
1,000 hours of Arabic speech audio sampled at 16 kHz, sourced from over 700 YouTube channels. The collection spans multiple regions, genres, and dialects to support the development of speech recognition technologies.
400,213 audio files totaling 998 hours and 41 minutes of validated Ukrainian speech data. The dataset is a subset of the YODAS2 corpus, curated by the user 'speech-uk' and last updated on October 26, 2025. It is hosted on Hugging Face and associated with Ukrainian speech technology communities.
Aggregating crowdsourced speech recordings and transcriptions for over 20 listed languages including Abkhaz, Basaa, and Cantonese. It is an unofficial conversion of the Mozilla Common Voice Corpus 16.0, providing paired audio and text data for multilingual speech technology development.
1,000 hours of Arabic speech audio sampled at 16 kHz, collected from over 700 YouTube channels. The data spans multiple regions, genres, and dialects to support the development of speech recognition technologies.