Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,924 datasets
Sarvam AI developed this synthetic benchmark in 2026 to evaluate context-aware Automatic Speech Recognition (ASR) within voice bot environments. The collection includes between 1,000 and 10,000 records covering the top 10 Indian languages, focusing on how conversation history and agent prompts influence transcription accuracy.
An audio dataset likely contains general utterances spoken by Spanish speakers from Spain. The dataset's size, specific content, and creation details are unknown. It is hosted on Kaggle.
A high-quality, single-speaker Persian (Farsi) narration dataset intended for training text-to-speech models. The dataset was created by author pymmdrza and was last updated on January 21, 2026. The description emphasizes professional narration quality for TTS applications.
Updated Hate-Speech Dataset is a text corpus likely containing social media posts or comments annotated for offensive language. The dataset is hosted on Kaggle, but its specific size, origin, and update details are not provided in the metadata. Columns and sample data are unknown, requiring verification after download to confirm content and structure.
An audio dataset named TrainXttsV2_Audiobook, likely containing speech recordings for text-to-speech model development. The dataset is hosted on Kaggle, but its specific size, creator, and update date are unknown. Columns and sample data are unavailable, so the exact content requires verification after download.
Tts Polish Nemo is a dataset for text-to-speech synthesis, published on HuggingFace by datadriven-company. The dataset was last updated on March 13, 2026. Its specific content and scale require verification after download.
A dataset from the GAMETES repository for generating epistasis models. The specific attributes, sample size, and data structure are unknown.
A dataset from the OpenML platform with an identifier suggesting it relates to genetic heterogeneity modeling. No concrete details on size, content, or structure are available from the provided input.
A dataset titled 'fasrtyuj' is available on the Kaggle platform. The dataset's content, structure, and origin are not described in the provided metadata. Further details about its creation, size, and specific contents require verification after download.
A dataset titled 'MusicThree' published on Kaggle. The dataset's content likely relates to music, but specific details such as size, format, and creation date are unavailable. Metadata is minimal; actual content requires verification after download.
MusicSecond is a dataset hosted on Kaggle. Its title suggests it contains audio data related to music. The dataset's specific content, size, and origin are not detailed in the available metadata.
MusicOne is a dataset hosted on Kaggle. Its title suggests a focus on music-related information. The dataset's specific content, scale, and origin require verification after download due to minimal provided metadata.
Tttsophia is a text-to-speech audio dataset published on Kaggle. The dataset's specific content, size, and creation details are not provided in the available metadata. Further verification after download is required to confirm its exact composition and potential applications.
A Techsalerator dataset containing YouTube and video data for the Caribbean nation of Saint Kitts and Nevis. The dataset's specific content, volume, and collection methodology are not detailed in the available metadata. The original source and last update date are also unknown.
Turkish TTS Data is a collection for speech synthesis and automatic speech recognition tasks, created by Anilosan15. It contains audio and corresponding text data in the Turkish language. The dataset was last updated in March 2026.
ASR-Model-Offline is a dataset published on Kaggle. The title suggests it contains data for training or evaluating offline automatic speech recognition models. The dataset's specific content, size, and origin require verification after download.
George McKay authored a report titled 'From Glyndebourne to Glastonbury: The impact of British music festivals'. The report is based on a review of academic and grey literature and identifies eight areas of economic, social, and cultural impact. The dataset appears to be the textual content of this report or its associated data.
A thesis authored by Charles Coüasnon on the Church of the Holy Sepulchre in Jerusalem. The work was submitted to the Massachusetts Institute of Technology Department of Architecture in 1959. The content likely contains architectural and historical analysis.
Common Voice metadata and versioning details provided by the common-voice organization, last updated in March 2026. This repository tracks the evolution and release history of the global open-source speech corpus. It serves as the administrative layer for managing dataset releases across multiple languages.
320,000 songs were scraped from the Last.fm API. The data includes genre tags, mood labels, and popularity information. The author, organization, and specific update date are not provided.