Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,925 datasets
Tamil language audio data for automatic speech recognition (ASR). The dataset is published on Kaggle and likely contains speech recordings and corresponding transcriptions. The Indian Institute of Science (IISc) MILE lab is inferred as the source, but specific details on size, collection method, and time range are unavailable.
dataindextts3_song is a dataset published on Kaggle. The title suggests it contains audio data related to text-to-speech synthesis, potentially for song generation. The dataset's specific content, size, and origin are not detailed in the available metadata.
Erhu Timbre Audio Dataset contains audio-based timbre records for the Chinese string instrument, the erhu. The dataset includes CSV labels, likely for categorizing or annotating the audio samples. It is hosted on Kaggle, but details about its creation, size, and update history are unavailable.
A sample of restaurant market data for the city of Quincy, Massachusetts, provided by BeamStation. The dataset contains listings for all restaurants in the area, though the exact number of records is unspecified. The original creation date and update frequency are not documented.
Raw oral transcription texts, preprocessing results, and derived linguistic feature datasets for cognitive ability research. It includes code for classification experiments. The author is chen, xuanshu, and the dataset was last updated in February 2026.
An invoice dataset published on Kaggle. The dataset likely contains structured information related to business invoices, such as amounts, dates, and vendor details. Its specific content, size, and origin require verification after download.
A speech corpus for the Urdu language, published on Kaggle. The dataset likely contains audio recordings paired with corresponding text transcripts for training text-to-speech systems. Specific details on size, collection method, and contributors are not provided in the available metadata.
An audio dataset of Hindi speech, published on the Kaggle platform. The dataset likely contains audio files of spoken Hindi, which can be used for training and evaluating speech processing models. Specific details on the number of recordings, speakers, recording conditions, and collection methodology are not provided in the available metadata.
Georgian language audio recordings for text-to-speech synthesis, published on Kaggle. The dataset's size, collection method, and specific content are not detailed in the available metadata. Further details regarding the number of samples, recording quality, and speaker demographics require verification after download.
Emotion-Aware Music Sentiment Dataset provides multimodal audio features and contextual metadata for emotion-based music AI. The dataset originates from Kaggle, though specific details on volume, authorship, and recency are unavailable.
Satellite imagery data covers the Caribbean nation of Saint Kitts and Nevis. The dataset is provided by Techsalerator and hosted on Kaggle. Specific details on data volume, collection date, and resolution are not provided in the input.
Archive of Our Own (AO3) data related to music and bands, collected via web scraping. The dataset's size, row count, and specific attributes are unknown. The author, organization, and last update date are also unspecified.
A curated collection of historical audio recordings sourced from the Library of Congress Citizen DJ collections. The dataset is designed for open research, audio analysis, music information retrieval, remixing, and AI/ML experimentation. This mini release is intended as a lightweight subset for testing pipelines, educational use, and small-scale experiments.
AsramaGH likely contains data related to housing or community structures. The dataset is published on Kaggle, but its creator, size, and specific content are unknown. Its last update date is also unknown.
AsramaGH is a dataset published on Kaggle. Its title suggests it may contain information related to housing or community structures. The specific content, size, and origin are unknown.
Child Trends provides data on student engagement in arts education. The dataset's specific variables, size, and temporal coverage are not detailed in the available metadata. The original source is listed as paperswithcode, a platform for machine learning resources.
A processed corpus for Urdu text-to-speech (TTS) applications, published on Kaggle. The dataset likely contains audio recordings and corresponding text transcriptions. Specific details on size, source, and processing methods are not provided in the available metadata.
DataIndexTTS4 is a dataset published on Kaggle. Its title suggests it is related to text-to-speech (TTS) technology. The dataset's specific content, size, and origin require verification after download.
A dataset titled 'dataindextts3' is hosted on Kaggle. The title suggests it contains data related to text-to-speech (TTS) synthesis. No further metadata, such as author, size, or sample details, is provided.
30 Musical Instruments is a dataset hosted on Kaggle. The title suggests it contains audio samples or information related to a collection of thirty different musical instruments. The dataset's specific content, size, and origin are not detailed in the provided metadata.