Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,909 datasets
TTSSDSC is a speech dataset published on Kaggle. Its title suggests a focus on text-to-speech synthesis. The dataset's specific content, size, and origin require verification after download.
EEG ADHD Nasrabadi MAT is a dataset of electroencephalogram (EEG) recordings related to Attention-Deficit/Hyperactivity Disorder. The dataset is hosted on Kaggle, but its specific scale, collection methodology, and authorship details are not provided in the available metadata. The title suggests it likely contains time-series brainwave data for analysis.
Datapointai released this dataset in March 2026 containing 1,000 text-to-speech audio pairs and 15,000 human preference annotations. Each entry consists of a single text prompt rendered by two different TTS systems, with 15 human labels indicating which version sounds more natural.
Kaggle hosts the f5tts_clouds dataset. The title suggests it contains imagery of cloud formations, likely for meteorological or computer vision analysis. The dataset's author, organization, and specific collection details are not provided in the available metadata.
Speech audio data from telemarketing calls placed by Indian agents to customers in the United States. The dataset is hosted on Kaggle, but the author, organization, and specific collection details are unknown. The size, format, and number of recordings are unspecified.
A filtered version of the Common Voice dataset for automatic speech recognition (ASR). Samples with fewer than three words, repetitive tokens, or chat token leaks have been removed. The dataset was created by OpenSpeechHub and was last updated on March 31, 2026.
Aggregating between 1,000 and 10,000 manually aligned audio-text pairs from Kazakh commercial songs, released by yeshpanovrustem in 2026. It provides line-level vocal segments designed to investigate the utility of sung speech for low-resource automatic speech recognition (ASR) systems.
French Asr Quebec Eu is a speech dataset hosted on HuggingFace by the author ele-sage. The title suggests it contains audio data for automatic speech recognition (ASR) in French, likely with a focus on the Quebec dialect. The dataset was last updated on April 5, 2026.
12 audio samples of spoken AI news content comprise this ASR evaluation benchmark created by Trelis in 2026. It provides reference transcriptions and entity annotations specifically for technical AI terminology like model names and benchmarks.
Massachusetts contains all United States Coast Guard facilities within its borders as of August 2007. The data were compiled by the Massachusetts Office of Coastal Zone Management and are provided as GIS data. The dataset shows the location of these facilities.
Five major U.S. rivers entering the Gulf of Maine are monitored, including the Penobscot, Kennebec, Androscoggin, Saco, and Merrimack. The dataset provides real-time discharge data for the past 7 days and current streamflow conditions in Maine and Massachusetts. It is sourced from the US Geological Survey Water Resources Division via a NASA Earthdata gateway.
Massachusetts Coastal Zone polygons represent the official coastal management boundary as defined by state regulation 301 CMR 21.99. The boundary layer was compiled by the Massachusetts Office of Coastal Zone Management in accordance with the federal Coastal Zone Management Act of 1972.
All extant lighthouses on the coastline of Massachusetts are mapped in this dataset. Locations reflect current positions, which may differ from original sites. The data was compiled by SCIOPS.
A digital geologic map of Cape Cod and the islands, reprojected into the Massachusetts State Plane coordinate system. The data was processed by the Massachusetts Office of Coastal Zone Management in June 2006. The original data source is the SCIOPS organization.
NOAA NCEI Accession 0000411 contains aerial photographs of aquatic vegetation captured from aircraft over Florida Bay, the Indian River in Florida, and the Coast of Massachusetts. The photographs were scanned and geo-referenced for mapping purposes. Data is stored on a DLT tape as a secure backup copy.
One million synthetic audio samples for text-to-speech applications, generated across 1000 distinct speakers. The collection was created by Aynursusuz, with each speaker contributing 1000 samples derived from 100 texts and 10 voice clones. The dataset was last updated on Hugging Face on March 11, 2026.
32,901 paired Amharic speech audio files and transcriptions processed from the BDU-speech dataset by Yohannes A. Ejigu. Updated in March 2026, the collection provides mono audio recordings specifically structured for automatic speech recognition research and model training.
Fewer than 1,000 audio recordings and text transcripts of Nepali-English code-switched speech from technical interviews. Developed by devrahulbanjara and updated in March 2026, it captures software engineering terminology embedded in Nepali conversational grammar.
Chinese speech recognition data published on Kaggle. The dataset likely contains audio recordings and corresponding transcriptions for training and evaluating automatic speech recognition (ASR) systems. Specific details on size, collection method, and contributors are not provided in the available metadata.
vi_asr_dataset is a dataset for Vietnamese automatic speech recognition, published on Kaggle. The dataset likely contains audio files and corresponding transcriptions. Its specific size, collection method, and authorship are currently unknown.