Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
2,018 datasets
Medical Asr En is a dataset for automatic speech recognition in a medical context, published on the Hugging Face platform by author jarvisx17. The dataset was last updated on January 30, 2023. Its specific content, size, and structure require verification after download.
1 PyTorch implementation for self-supervised dance video synthesis across music and dance categories. The repository provides the official code for the ACM MM 20 Oral paper on generative dance video synthesis.
LibriSpeech contains approximately 1000 hours of 16kHz read English speech. The corpus was prepared by Vassil Panayotov with assistance from Daniel Povey, derived from audiobooks in the LibriVox project. The dataset was uploaded to Hugging Face by nguyenvulebinh in December 2022.
10 hours of Turkish media speech audio clips designed for evaluating Automated Speech Recognition (ASR) systems. This dataset is part of the MediaSpeech collection which also covers French, Arabic, and Spanish languages.
1 repository indexing multiple datasets for Music Emotion Recognition (MER). The collection organizes metadata for various audio-based resources to facilitate research in affective musicology. It provides a centralized point of access for datasets involving musical audio and emotional labels.
2023 data from the Carnegie Library of Pittsburgh details public wifi usage across its library locations. The dataset is provided by the Allegheny County / City of Pittsburgh / Western PA Regional Data Center. Specific row and column counts are unknown.
SMAPVEX19-22 field campaign data includes plant area index (PAI) values and the RGB images used to derive them. The data were collected between April 2019 and December 2022 near Petersham, Massachusetts. The NSIDC_CPRD organization produced this dataset to support validation of satellite-derived soil moisture estimates in forested areas.
106,574 tracks from 16,341 artists across 161 genre categories. The collection includes 917 GiB of audio data, pre-computed features like MFCCs, and metadata tables linking tracks to albums and artists.
Approximately 1000 hours of Tamil audio paired with transcripts. The transcripts have been de-duplicated using exact match deduplication. The dataset was created by parambharat and last updated in December 2022.
A map resource for City of Pittsburgh political wards, maintained by Allegheny County and the Western PA Regional Data Center. It provides geographic boundaries for local administrative and electoral divisions. The dataset was last updated in January 2023.
The Sharif Emotional Speech Dataset contains 3000 semi-natural utterances, equivalent to 3 hours and 25 minutes of speech data. It covers five basic emotions and a neutral state, labeled by 12 annotators from speech samples of 87 native-Persian speakers extracted from online radio plays.
4,000 Romanian sentences recorded across 8 sessions by a single speaker in a hemianechoic chamber. Audio was captured at 96 kHz/24-bit and downsampled to 48 kHz using a Sennheiser MKH 800 small diaphragm condenser microphone.
Audio recordings and orthographic transcriptions from the Norwegian Parliament categorized into Norwegian Bokmål and Norwegian Nynorsk written standards. The corpus serves as a benchmark for Norwegian Automatic Speech Recognition (ASR) systems using official parliamentary proceedings.
Library of Congress provides music catalog data in XML format, last updated in December 2022. The dataset contains bibliographic records for musical works. Specific row counts, column features, and size details are unavailable.
Audio signal recordings and MLP neural network configurations for sound classification on edge devices. It provides training components for exporting models to Raspberry Pi 2 or superior hardware using USB microphone inputs.
A speech recognition dataset for the Telugu language, published on the Hugging Face platform. The dataset was uploaded by author 'bnriiitb' and was last updated on November 22, 2022. The specific content, size, and structure of the audio files are not detailed in the available metadata.
1,000 hours of audio recordings and transcriptions derived from LibriVox and Project Gutenberg for speech recognition and synthesis. The collection features French audio clips between 1 and 20 seconds in length paired with literary texts published from 1884 to 1964.
For audio classification tasks related to music genres. It was created by lewtun and last updated on November 2, 2022. The specific number of rows, columns, and audio features is unknown.
A 119-hour corpus of English-language earnings calls collected from global companies. The dataset was created by anton-l and uploaded to Hugging Face in October 2022. Its primary purpose is to serve as a benchmark for automatic speech recognition models on real-world accented speech.
8 hours and 23 minutes of Italian speech audio from a single female speaker, recorded at a 16000Hz sample rate. It is derived from female audio segments found in the M-AILABS Speech Dataset and adapted as an Italian version of LJSpeech for training text-to-speech models.