Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
2,018 datasets
Asr Book Lm V2.0 is a text corpus for training language models in automatic speech recognition systems. The dataset was created by author Jiejie and was last updated on March 14, 2022. Its size is categorized as 1K<n<10K, indicating it contains between 1,000 and 10,000 entries.
BUREAU DE RECHERCHES GÉOLOGIQUES ET MINIÈRES provides a dataset of land transport axes classified by noise level in Finistère, France. The data applies the Prefectural Decree of Finistère Sound Classification No 2004-0101 and a Morbihan decree for Guilligomarc’h. It is intended for planning document study offices and was last updated on February 2, 2022.
13,100 short audio clips and corresponding transcriptions featuring a single speaker reading from 7 non-fiction books. The dataset totals approximately 24 hours of audio with individual clip durations ranging from 1 to 10 seconds.
A text corpus for language modeling, sourced from books and curated for automatic speech recognition tasks. The dataset was created by author Jiejie and last updated in March 2022.
The LibriSpeech ASR Test dataset contains approximately 1000 hours of 16 kHz English speech derived from LibriVox audiobooks. It was prepared by Vassil Panayotov with assistance from Daniel Povey and is carefully segmented and aligned.
20,000+ hours of Russian speech audio paired with text transcriptions across domains like YouTube, audiobooks, and radio. The collection includes over 2 million utterances categorized by source and acoustic conditions.
A text corpus for language modeling in automatic speech recognition systems, created by Jiejie and hosted on Hugging Face. The dataset was last updated in February 2022. Its size is categorized as 1K to 10K entries.
For Quran speech recognition, sourced from Kaggle by author Nuwaisir. The dataset contains text modality data, with specific row and column counts unknown.
Preprocessed text data sourced from Reddit, intended for training or evaluating Automatic Speech Recognition (ASR) systems. The dataset was created by DDSC and last updated on the Hugging Face platform in February 2022. Its size is indicated as between 1 million and 10 million entries.
Financial statements for the municipal cultural institution Kharkiv Specialized Music and Theatre Library named after K.S. Stanislavsky. The dataset was published on the Kharkiv Open Data Portal and automatically placed on the Unified State Open Data Portal of Ukraine. It was last updated on 2021-12-24.
CommonVoice Mt 8 Processed is a Maltese language audio dataset derived from Mozilla's Common Voice project. The dataset was processed and uploaded by RuudVelo in February 2022. It contains audio recordings paired with corresponding transcriptions for speech technology development.
Common Voice NL 8 Processed is a Dutch-language subset of Mozilla's crowdsourced speech corpus. The dataset was uploaded to Hugging Face by user RuudVelo in February 2022, indicating processing of the eighth version of the Dutch Common Voice data. It contains audio clips paired with text transcriptions for speech technology development.
A dataset for Vietnamese text-to-speech synthesis, processed and uploaded to HuggingFace by user geninhu in January 2022. It contains processed audio and corresponding text data, as indicated by platform tags. The specific size and number of samples are not detailed in the available metadata.
Encompassing question-answer pairs designed to model verbal predicate-argument structure. The train split originates from the QASRL Bank (QASRL-v2/LS), constructed via crowdsourcing, while the dev and test splits are from QASRL-GS (Gold Standard).
Dnipro, Ukraine, provides data on the consumption of communal resources by the city's communal institution of culture, Dnipro Children's Music School No. 10. The dataset likely contains utility usage metrics, such as water or electricity consumption, for the school. It was published on the States site of Ukraine and last updated on December 3, 2021.
Gujarati speech recordings and transcriptions categorized for Automatic Speech Recognition (ASR). This dataset provides audio-text pairs sourced from the OpenSLR repository to facilitate public access to Gujarati language resources.
Comprising audio files for automatic speech recognition (ASR). It is categorized as containing under 1,000 samples and is associated with the US region. The dataset was last updated in January 2022.
84 hours of Sanskrit audio data for training automatic speech recognition models, uploaded by user 'addy88' to Hugging Face in December 2021. The dataset is categorized as containing 10K to 100K samples and includes text transcriptions.
Petersham, Massachusetts hosts ground-based soil moisture, soil temperature, and air temperature measurements from twenty-five temporary stations. The stations were installed across an area of approximately 23 km by 36 km in May 2019 and operated through 2022. The dataset is produced by NSIDC_CPRD and was last updated in October 2021.
An evaluation dataset for Automatic Speech Recognition (ASR) systems in the Sanskrit language. The dataset was created by user 'addy88' and published on the Hugging Face platform in December 2021. Its specific size and structure are not detailed in the provided metadata.