Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
2,009 datasets
MIDI files represent classical compositions from renowned artists like Bach, Beethoven, Chopin, and Mozart. The collection is organized into directories by composer. It was created by user 'drengskapur' and last updated in July 2024.
A dataset named 'Khm Asr Data Test' was published on the HuggingFace platform by author 'rinabuoy' on August 16, 2024. The title suggests it likely contains audio data for testing Khmer language automatic speech recognition (ASR) systems. The dataset's specific content, size, and structure are not detailed in the provided metadata.
viVoice provides between 100,000 and 1,000,000 Vietnamese audio-text pairs for multi-speaker speech synthesis, released by capleaf in 2024. The dataset is specifically formatted for text-to-speech tasks and is distributed via Parquet files.
VoxLingua107 is a speech dataset for training spoken language identification models. It contains 6628 hours of short speech segments sourced from YouTube videos, covering 107 languages. The dataset was created by SEACrowd and was last updated in June 2024.
MusicScore is a large-scale dataset of music score images paired with textual metadata. It was collected and processed from the International Music Score Library Project (IMSLP) by authors Yuheng Lin, Zheqi Dai, and Qiuqiang Kong. The dataset was last updated on June 20, 2024.
Libri-light is a dataset of 60,000 hours of unlabeled English speech audio from audiobooks. It serves as a benchmark for training automatic speech recognition systems with limited or no supervision.
ChartQA is a multimodal dataset hosted by ahmed-masry on Hugging Face, last updated on June 22, 2024. It likely contains chart images paired with textual questions and answers for visual question answering tasks. The dataset requires manual download of a zip file and cannot be loaded directly via the standard datasets library function.
369,510 hours of speech audio and text captions sourced from YouTube, released by the espnet team in 2024. The dataset pairs audio utterances with either user-uploaded (manual) or system-generated (automatic) captions.
Agreements for KZSMO "Musical School No3" in the KMR have been concluded from 2019 to the present. The dataset is sourced from the States site of Ukraine and was last updated on June 12, 2024. The specific contents and scale of the agreements are not detailed.
Additional agreements to contracts from 2019 to the present time for KZSMO 'Musical School No3' KCC. The data originates from the States site of Ukraine and was last updated on 2024-06-12. The specific number of contracts, rows, and file size are not provided in the metadata.
Most songs collected are love songs, touching on themes of nostalgia and saudade as well as lively dances. The collection process involved interviewing people and learning about their lives through songs linked to agricultural work and annual cycles. The dataset was coordinated by Álvarez Pérez, Xosé Afonso and last updated in May 2024.
LibriSpeech ASR Dummy is a small-scale dataset from Hugging Face's internal testing, containing audio-text pairs for English speech recognition. It was created by hf-internal-testing and last updated in June 2024. The dataset is categorized as 'n1 K', indicating it contains approximately 1,000 samples.
Álvarez Pérez, Xosé Afonso coordinated this oral history dataset from the e-cienciaDatos Harvested Dataverse, last updated on 2024-05-05. It contains a biographical narrative from an informant, Jesús López, detailing his life, education, and language use in the San Martín de Trevellu/Trevejo area. The description suggests the data covers topics such as schooling, bilingualism between Spanish and the local 'lagarteiro' language, and cultural practices like music and festivals.
José Tomás Sousa (Olivenza). Folklore musical de Olivenza (I) is a collection of folk music recordings from Olivenza, Spain. The dataset, coordinated by Álvarez Pérez, Xosé Afonso, was last updated on May 5, 2024. It focuses on the 'saias' genre and includes other types like occasional songs, gaios, vira, fados, and corridinhos.
Nawar Halabi at the University of Southampton developed this speech corpus as part of PhD work. Recordings were made in a professional studio using the south Levantine Arabic dialect with a Damascian accent. Synthesized speech output from this corpus has reportedly produced a high-quality, natural voice.
A collection of Hindi speech audio files for text-to-speech synthesis, created by the user skywalker290 and hosted on Hugging Face. The dataset was last updated in June 2024 and is categorized as containing between 10,000 and 100,000 samples based on platform tags.
A collection of text-to-speech audio samples collected from the OpenAI API and app. The dataset includes samples from the Sky and Juniper voices, stored as clean lossless audio files. It was uploaded by leafspark and last updated on May 22, 2024.
Two categories of audio data, speech and music, are provided in a format compatible with the PyTorch framework. This dataset serves as a specialized loader for acoustic analysis and machine learning tasks.
NCHLT Speech Corpus Xhosa contains audio recordings of the Xhosa language, a major South African language. The dataset was created by Beijuka and uploaded to Hugging Face in June 2024. It is part of the National Centre for Human Language Technology (NCHLT) initiative.
Descriptive text data on folk music and dance traditions from the Olivenza region, likely documenting cultural practices. The dataset was coordinated by Álvarez Pérez, Xosé Afonso and harvested into the e-cienciaDatos Dataverse platform. It was last updated on May 5, 2024.