Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,910 datasets
F5-TTS_Guj_Malyalam is a dataset published on Kaggle. The title suggests it contains audio data for text-to-speech synthesis in the Gujarati and Malayalam languages. The dataset's specific content, size, and collection details are unknown from the provided metadata.
Librispeech Synth 300h max 5spks is a speech audio dataset published on Kaggle. The title suggests it contains synthetic speech audio derived from the LibriSpeech corpus, likely comprising up to 300 hours of audio from a maximum of five speakers. The specific source, creation method, and exact content require verification after download.
nemo-asr-wheels is a dataset published on Kaggle. The title suggests it contains artifacts related to the Nvidia NeMo automatic speech recognition toolkit, likely including pre-built wheels or model files. The dataset's specific content, size, and origin are not detailed in the provided metadata.
jp-asr-eval-data is a dataset for evaluating Automatic Speech Recognition (ASR) systems on Japanese language audio. Published on Kaggle, its specific size, creation date, and author are unknown. The dataset likely contains audio files paired with transcriptions for performance benchmarking.
F5-TTS_Tele_Kannada_SD is a dataset hosted on Kaggle. The title suggests it contains data for text-to-speech synthesis in the Kannada language, likely including audio recordings and corresponding text transcripts. No further metadata about its size, origin, or structure is provided.
F5-TTS_Tamil_SD is a dataset published on Kaggle. The title suggests it contains data for Tamil text-to-speech synthesis. The dataset's specific size, origin, and update date are unknown.
F5-TTS_Punjabi_SD is a dataset published on Kaggle. Its title suggests it contains audio data for Punjabi text-to-speech synthesis. The dataset's specific size, author, and creation date are unknown.
A dataset for text-to-speech synthesis in the Bengali language, published on Kaggle. The specific data volume, collection method, and temporal coverage are unknown. The dataset likely contains audio samples and corresponding text transcripts.
This dataset supports analysis of accounting practices in Massachusetts corporations from 1870 to 1895. It contains data used to estimate the prevalence of double-entry bookkeeping and depreciation, with findings showing balancing returns increased from 60% to 96% over the period. The data was compiled by author Caitlin Rosenthal for the paper 'Balancing the Books: Convergence and Diversity in Accounting, 1875-1895'.
This dataset supports a study on the adoption of double-entry bookkeeping and depreciation accounting by Massachusetts corporations from 1875 to 1895. It contains data used to estimate that 60% of firms balanced returns in 1875, rising to over 96% by 1895. The proportion considering depreciation increased from 18% to 24% over the same period.
F5-TTS_Hindi_SD is a dataset published on Kaggle. The title suggests it contains audio data for Hindi text-to-speech synthesis. Metadata is minimal; the specific content, size, and creation details require verification after download.
Wake Word Akylai is a dataset published on huggingface by the-cramer-project. It likely contains audio samples for training and evaluating wake-word or keyword-spotting models. The dataset was last updated on April 8, 2026.
Darija ASR checkpoints likely contain model weights for a speech recognition system trained on Moroccan Arabic dialect. The dataset is hosted on Kaggle, a platform for sharing data and machine learning models. Specific details on the data size, collection method, and creators are not provided in the available metadata.
A library of reference audio voices for text-to-speech applications, published on Kaggle. The dataset is associated with the StoryNiche platform and is intended for use in Kaggle TTS (Text-to-Speech) tasks. Specific details on the number of voices, audio characteristics, and collection methodology are not provided in the available metadata.
The TIMIT corpus provides broadband recordings of 630 speakers from eight major American English dialects, each reading ten phonetically rich sentences. It was created through a joint effort by MIT, SRI International, and Texas Instruments, with recordings made at TI and transcriptions verified at MIT and NIST. The corpus includes time-aligned orthographic, phonetic, and word transcriptions alongside 16-bit, 16kHz speech waveform files for each utterance.
A music dataset published on Kaggle by a user named Tuananh. The dataset's specific content, size, and collection method are not detailed in the provided metadata. Its title suggests it contains audio data or related features for music analysis tasks.
Music sub-genres is a collection of audio clips from various music sub-genres. The dataset is intended for fine-tuning audio models, as described on Kaggle. Details regarding its size, creator, and update history are not provided.
A speech audio dataset with content likely related to African languages or contexts. It was published on the Hugging Face platform by the author 'amanuelbyte'. The dataset's record was last updated on March 31, 2026.
A dataset for music genre classification, likely containing audio files or features for a machine learning challenge. It was published on the Kaggle platform. The specific collection method, size, and temporal coverage are not detailed in the available metadata.
Featuring conversational and phrasal speech training and test data for the Telugu, Tamil, and Gujarati languages. Each entry includes an audio recording and its corresponding transcript, provided by Microsoft and SpeechOcean.com for research purposes.