DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,589 datasets

Speech & Audio

VOXCeleb: Celebrity Speech and Audio-Visual Data

VOXCeleb is a dataset of speech and video clips featuring celebrities. It is hosted on the Kaggle platform. The specific size, collection method, and time range are not detailed in the provided metadata.

AudioVideoSpeaker IdentificationAudio ProcessingCelebritiesSpeech Recognition+1

0 views

Speech & Audio

TTS13G: Text-to-Speech Dataset

TTS13G is a dataset hosted on Kaggle. Its title suggests a focus on text-to-speech synthesis, likely containing audio samples and corresponding text transcripts. The dataset's specific size, origin, and detailed contents are not described in the provided metadata.

AudioText To SpeechSpeech SynthesisAudio Generation+1

0 views

Speech & Audio

Marina and Yacht Club Locations Along the Massachusetts Coast, 2007

Point-based GIS data showing the locations of marinas, yacht clubs, boat yards, and related facilities along the Massachusetts coast. The data were compiled in 2007 from public lists, databases, and visual inspection of orthoimagery by the organization SCIOPS. All data are represented as points with associated attribute data and include facilities defined as catering to recreational yachtspersons.

GeospatialMarinasRecreationCoastal InfrastructureYacht Clubs+1

0 views

Speech & Audio

Potential Massachusetts Coastal Desalination Plant Sites, 2006

June 2006 GIS data showing potential or developing locations for desalination plants along the Massachusetts coast. The data are preliminary and speculative, compiled from public media reports, state/federal regulatory filings, local meeting proceedings, and private contractor studies. The dataset was aggregated by SCIOPS and is hosted on the nasa_earthdata platform.

GeospatialMassachusettsDesalinationCoastal managementWater Infrastructure+1

0 views

Speech & Audio

Proposed 2005 Northeast Gateway Lateral Natural Gas Pipeline Route in Massachusetts Bay

Massachusetts Bay hosts a geospatial data layer detailing a proposed 16-mile, 24-inch diameter natural gas pipeline lateral. The dataset, created by SCIOPS, represents the project layout as of September 27, 2005, based on surveys using DGPS, multibeam, sidescan, and diver inspections. It was surveyed according to US Army Corps of Engineers standards (EM 1110-2-1003).

GeospatialNatural Gas InfrastructurePipeline RouteOffshore SurveyGeospatial Planning+1

0 views

Speech & Audio

SASRec V1: Sequential Recommendation Model

A model artifact for sequential recommendation, published on Kaggle. The specific data format, size, and creation details are not provided in the metadata. The content and structure require verification after download.

TabularSasrec ModelRecommender SystemsSequential Recommendation+1

0 views

Speech & Audio

FTIR Peaks of Cigarette Butts and Beach Sand by Use Zone

A collection of characteristic FTIR peaks (cm⁻¹) for cigarette butts and beach sand, categorized by beach use zones. The data is provided in an XLSX file of 16.8 KB, authored by Claudia Díaz-Mendoza and last updated in March 2026.

Cigarette ButtAbundant Litter ItemsPilot Tourist SiteTransform Infrared SpectroscopyCellulose Acetate FiltersCigarette ButtsColombia PGenerating Persistent FiberCellulose Acetate DegradationEvaluate Temporal TrendsCigarette Butt FiberReducing Cb PollutionFunctional Groups IdentifiedIdentify Functional GroupsAnalyzed Using FourierFunctional Group AnalysisPolicy Measures AimedCigarette Butts ConstituteMicroplastic Pollution Due+1

0 views

Speech & Audio

Submarine Electric Cable Route in Nantucket Sound

A geospatial line feature representing the western portion of a 46kV electric supply cable from Hyannis, Cape Cod to Nantucket Harbor. The data was created by SCIOPS using geographic coordinates from a National Grid drawing dated February 2005, incorporating marine survey data from 1994.

AudioGeospatialMarine SurveyEnergy TransmissionSubmarine CableElectric Infrastructure+1

0 views

Speech & Audio

Electric Supply Cable Route in Nantucket Sound

Geospatial data from 1994 and 1996 shows the location of a 46kV submarine electric supply cable between Harwich Port, Cape Cod and Nantucket Harbor, Massachusetts. The line feature was created by SCIOPS using coordinates from a 1996 National Grid drawing and incorporates data from a marine survey conducted in 1994.

AudioGeospatialGeospatial LocationMarine SurveySubmarine CableElectric Infrastructure+1

0 views

Speech & Audio

Voicedesign3: Synthesized Vietnamese Text-to-Speech Dataset

Voicedesign3 is a Vietnamese text-to-speech dataset created by ShiniChien. The dataset is synthesized, meaning the audio was likely generated by a TTS model rather than recorded from human speakers. It was last updated on HuggingFace on April 24, 2026.

AudioText To SpeechVoice DesignVietnamese LanguageSynthetic Speech+1

0 views

Speech & Audio

VoxCeleb1: A Large-Scale Speaker Identification Dataset

VoxCeleb1 is an audio-visual dataset sourced from YouTube videos. It is published on the Kaggle platform. The dataset's specific size, creation date, and author are not detailed in the provided metadata.

AudioVideoMachine LearningSpeaker VerificationAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

CrispASR: CUDA Build Tools for Speech Recognition

CrispASR Kaggle CUDA Build is a dataset hosted on Kaggle. Its title suggests it relates to CUDA build tools, likely for the CrispASR speech recognition project. The dataset's specific content, size, and structure are not detailed in the available metadata.

AudioCudaBuild ToolsKaggleSpeech Recognition+1

0 views

Speech & Audio

Multi-Instrumental MIDI Files with Monophonic Melodies and Segment Labels

Mono Segments contains over 310,000 multi-instrumental MIDI files selected from the Discover MIDI Dataset. The dataset is enriched with lead monophonic melodies and high-precision structural segment labels, created by author asigalov61.

AudioMidi SegmentationMonophonic MelodyMidi SegmentsMusic SegmentationLanguageenLicensecc By Nc Sa 40Size Categories100 Kn1 MTask Categoriesaudio ClassificationSegmentsMidiSotaMulti InstrumentalMusic SegmentsRegionus+1

0 views

Speech & Audio

STOMA: 23 Hours of Multi-Speaker Greek Speech for TTS

STOMA is a multi-speaker Greek speech corpus containing approximately 23 hours of studio-recorded read speech. It features audio from six native speakers (three male and three female), captured under controlled studio conditions to ensure high signal quality.

OPTIMIZED-PARQUETParquetSize Categories10 Kn100 KText To SpeechTask Categoriestext To SpeechLibrarypolarsLanguageelModalitytextLibrarymlcroissantTask Categoriesaudio ClassificationSpeech CorpusLibrarydatasetsLibrarypandasLicensecc By 40Neural TtsGreek LanguageRegionusTask Categoriesautomatic Speech RecognitionAnnotations Creatorsexpert Generated+1

0 views

Speech & Audio

Moroccan Darija Speech Recognition Dataset

Moroccan Darija ASR Dataset Split is a speech corpus for Automatic Speech Recognition, published on the Hugging Face platform by mohamedmou. The dataset was last updated on May 1, 2026, but its specific size, content, and collection methodology are not detailed in the available metadata.

AudioDarijaMoroccan ArabicSpeech Recognition+1

0 views

Speech & Audio

Music Audio Data

I music is a dataset hosted on Kaggle. Its specific content and scope are not detailed in the available metadata. The dataset's origin, size, and creation date are unknown.

AudioEntertainment+1

0 views

Speech & Audio

Spotify SASRec v1 Results: Sequential Recommendation Model Outputs

Results from the SASRec v1 model applied to Spotify data. The dataset is hosted on Kaggle. The specific content, size, and creation details are not provided in the metadata.

TabularSasrecSpotifyMusic RecommendationSequential Recommendation+1

0 views

Speech & Audio

Uyghur Speech Data with 2,157 Audio Files

Uyghur language speech recordings for natural language processing tasks. The dataset contains 2,157 audio files in MP3 format, totaling 3.03 GB, created by user 'anke01' and last updated on February 26, 2026.

AudioSpeech SynthesisUyghur LanguageAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

CoversBR: Metadata and Features for Brazilian Cover Song Identification

CoversBR is a large audio database focused predominantly on Brazilian music for cover and live song identification tasks. It comprises metadata and extracted features from 102,298 songs, organized into 26,366 cover groups, totaling approximately 7,070 hours of audio. The dataset is provided by Dirceu G Silva via AWS Open Data, but the original audio files are not included due to copyright restrictions.

AudioMusic Information RetrievalCopyright MonitoringAudio FeaturesCover Song IdentificationLive Song IdentificationBrazilian MusicMusic Features DatasetMusicMusic Recognition+1

0 views

Speech & Audio

MusicX: Audio Data Collection

MusicX is a dataset hosted on Kaggle. Its specific content and scale are not detailed in the provided metadata. The dataset likely contains audio data or features related to music, based on its title.

AudioAudio Analysis+1

0 views

PreviousPage 47 of 130Next