DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,572 datasets

Speech & Audio

FIFE Site-Averaged Flux Data: 1989

Site Averaged Flux Data contains 30-minute interval measurements from the 1987-1989 FIFE experiment, specifically for the year 1989. The dataset likely includes variables related to atmospheric radiation, land surface fluxes, and soil heat budgets. It represents a site-averaged product compiled from data collected by multiple principal investigators.

TabularTime SeriesZIPTextFife ExperimentATMOSPHERIC RADIATIONLand Surface FluxSoil Heat Budget+1

0 views

Speech & Audio

Site Averaged Neutron Soil Moisture: 1989 FIFE Experiment Data

1989 data from the FIFE experiment provides site-averaged daily neutron probe soil moisture measurements. This dataset contains processed, site-averaged values for soil moisture content collected during the 1987-1989 field campaign. It is part of a larger Earth science study focused on land surface processes.

TabularTime SeriesZIPTextFife ExperimentLAND SURFACESoil MoistureNeutron-probeEarth Science+1

0 views

Speech & Audio

Site Averaged Flux Data: 1987 (Betts)

FIFE experiment data provides a site-averaged time series of 30-minute average atmospheric and soil variables. The dataset covers the period from May 27, 1987, to October 16, 1987. It was collected by multiple principal investigators during the broader 1987-1989 FIFE campaign.

TabularTime SeriesZIPTextFife ExperimentAtmospheric FluxLAND SURFACESoil Heat BudgetSite Averaged Flux+1

0 views

Speech & Audio

FIFE Soil Moisture: Site-Averaged Gravimetric Measurements

Site Averaged Gravimetric Soil Moisture Data provides daily, site-averaged soil water content measurements from the First ISLSCP Field Experiment (FIFE). This dataset covers a two-year period from May 20, 1987, through August 12, 1989, representing a key observational product from the FIFE land-surface experiment. The data is likely used for validating remote sensing estimates and land surface models.

TabularTime SeriesZIPTextHydrologyLAND SURFACESoil MoistureEarth ScienceField Experiment+1

0 views

Speech & Audio

Dialectra Hausa Speech Corpus V1: Synthetic Audio Dataset

A Hausa language speech corpus created by Dialectra. The dataset is licensed under CC BY 4.0 and was last updated on June 22, 2026. It is described as a synthetic audio dataset for natural language processing.

AudioHausa LanguageSpeech CorpusNatural Language ProcessingAfrican LanguagesSynthetic+1

0 views

Speech & Audio

Northern Kurdish Raw Audio Collection for Speech Processing Research

Northern Kurdish (Kurmanji) speech recordings gathered from publicly available Kurdish media sources. The corpus was assembled by aranemini to support research and development in automatic speech recognition, speech translation, and other speech processing tasks. The dataset was last updated on June 20, 2026.

AudioKurdishSpeech ProcessingNatural Language ProcessingAudio CorpusSpeech RecognitionLow Resource Languages+1

0 views

Speech & Audio

Infrasound Exposure Effects on Human Stress and Mood, 36 Participants

36 participants were exposed to music with or without infrasound (~18 Hz) in a 2x2 between-subjects design. The dataset likely contains self-reported mood measures and salivary cortisol levels collected pre- and post-exposure, showing links between infrasound and negative affective states. The data was authored by Kale R. Scatterty and last updated on 2026-04-27.

TabularAudioCSVStress ResponseInfrasoundPsychophysiology+1

0 views

Speech & Audio

CO2 and H2O Eddy Flux Data from Amazon Tower Site

Four years of eddy flux measurements from January 2002 through January 2006 document carbon dioxide and water vapor exchange in a primary Amazon rainforest site. The data were collected by researchers from Harvard University and ORNL using tower-mounted gas analyzers and sonic anemometers at two heights. A companion Ph.D. thesis by Hutyra (2007) provides detailed analysis of carbon and water exchange processes.

Time SeriesZIPAmazon RainforestMeteorological TowerWater Vapor ExchangeCarbon Flux+1

0 views

Speech & Audio

Filipino and Tagalog Call Center Audio Recordings, 617 Hours Processed

169 hours of processed Filipino and 448 hours of processed Tagalog dual-channel audio recordings collected from real-world call center environments. This dataset by InfoBayAI supports the development of advanced speech and conversational AI systems. The dataset page was last updated on June 3, 2026.

AudioFilipino LanguageConversational AiTagalog LanguageCall Center AudioLarge ScaleSpeech Recognition+1

0 views

Speech & Audio

Lahgtna Levantine TTS: 50,000 Synthetic Speech Utterances

A synthetic speech dataset contains 50,000 utterances of Levantine Arabic dialect and Arabic-English code-switched speech. It was generated using the Lahgtna-OmniVoice fine-tuned TTS model and includes 10 speakers. The dataset page was last updated on 2026-05-30.

AudioText To SpeechAudio DatasetSpeech SynthesisCode SwitchingLevantine ArabicSynthetic+1

0 views

Speech & Audio

Russian Call Center Audio Dataset with Dual-Channel Recordings

1,025 hours of processed Russian dual-channel call center audio recordings form this dataset. It consists of real-world customer and agent speech collected from call center environments. The dataset was created by InfoBayAI and was last updated on 2026-06-03.

AudioDual Channel AudioRussian LanguageConversational AiCall CenterLarge ScaleSpeech Recognition+1

0 views

Speech & Audio

Hindi Podcast Audio Dataset with 11,607 Hours of Processed Speech

InfoBayAI's Hindi Podcast Audio Dataset is a large-scale collection of 11,607 hours of processed Hindi podcast audio recordings. The full dataset contains 57,568 hours of processed audio across 12 languages, capturing real-world interactions across diverse topics and formats. It was last updated on June 8, 2026.

AudioMultilingualHindiPodcastSpeech AiLarge ScaleAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

UK English Call Center Audio, 90,334 Hours of Dual-Channel Recordings

90,334 hours of processed English (UK) dual-channel call center audio recordings, part of a larger multilingual collection totaling 2,065,026 hours. The dataset consists of real-world customer and agent speech from call center environments, created by InfoBayAI and last updated on Hugging Face in June 2026. It is designed to support the development of advanced speech and conversational AI systems.

AudioConversational AiUk EnglishDual ChannelCall Center AudioLarge ScaleSpeech Recognition+1

0 views

Speech & Audio

English Podcast Audio Dataset with 1,165 Hours of Processed Speech

1,165 hours of processed English podcast audio recordings form part of a larger 57,568-hour multilingual collection. The dataset captures real-world interactions across diverse topics and formats to support speech AI development. It was created by InfoBayAI and last updated on Hugging Face in June 2026.

AudioConversational AiPodcastLarge Scale+1

0 views

Speech & Audio

Danish ASR Leaderboard: Benchmark Results Across Five Test Sets

Five independent public test sets evaluate Danish automatic speech recognition models. Each row represents one evaluated model, with scores reported as Word Error Rate (WER) and Character Error Rate (CER) percentages. The dataset, created by RyeAI, was last updated on 2026-06-19.

TabularAudioBenchmarkDanish LanguageSpeech RecognitionAutomatic Speech Recognition+1

0 views

Speech & Audio

Coastal SAV Survey with Hyperspectral and Lidar Data

Submersed aquatic vegetation (SAV) data was collected over Buttermilk and Plymouth Bays in Massachusetts during September 2010. The dataset results from a collaborative campaign using the CHARTS airborne hyperspectral/lidar system, supported by extensive ground truth sampling including bathymetry, diver surveys, and water column measurements. The data is managed by NASA and originates from a joint operation by the U.S. Army Corps of Engineers and the U.S. Naval Oceanographic Office.

GeospatialMultimodalCoastal BathymetryGround TruthOcean OpticsHyperspectral LidarSubmersed Aquatic Vegetation+1

0 views

Speech & Audio

WildVid-LIP: Over 64,000 Temporal Video Segments for Lip Reading

Over 64,000 curated temporal segments from unconstrained, real-world YouTube videos. WildVid-LIP is a large-scale, open-source dataset providing precise timestamp anchors for training Visual Speech Recognition and multimodal models. The dataset was created by Rizul2159 and was last updated on June 16, 2026.

AudioTime SeriesVideoMultimodalVisual Speech RecognitionTemporal AnchorsLip ReadingLarge Scale+1

0 views

Speech & Audio

Site Averaged AMS Data: 1987 Meteorological Measurements

Site Averaged AMS Data: 1987 (Betts) contains the site-averaged product from Portable Automatic Meteorological Stations deployed during the 1987-1989 FIFE experiment. Data are provided in 30-minute intervals for the year 1987. The dataset likely contains measurements of atmospheric pressure, solar radiation, longwave radiation, reflectance, surface temperature, surface winds, precipitation rate, and soil temperature.

TabularTime SeriesZIPTextAtmospheric MeasurementsSolar RadiationSoil TemperatureMeteorological DataSurface Winds+1

0 views

Speech & Audio

Site Averaged AMS Data: 1989 Meteorological Measurements

Site Averaged AMS Data: 1989 (Betts) contains 30-minute interval meteorological station data from the 1987-1989 FIFE experiment. Columns suggest measurements of atmospheric pressure, solar radiation, surface temperature, wind speed, precipitation rate, and soil temperature. The dataset provides site-averaged products from Portable Automatic Meteorological Stations deployed during the field campaign.

TabularTime SeriesZIPTextFife ExperimentAtmospheric ScienceSurface MeasurementsMeteorological Station+1

0 views

Speech & Audio

Site Averaged AMS Data: 1988 (Betts)

FIFE experiment data from 1988 provides site-averaged meteorological measurements in 30-minute intervals. The Portable Automatic Meteorological Station (AMS) data includes atmospheric pressure, solar and longwave radiation, surface temperature, wind speed, precipitation rate, and soil temperature. This dataset supports research on land-atmosphere interactions and surface energy balance.

TabularTime SeriesZIPTextSolar RadiationSurface AtmosphereSoil TemperaturePrecipitation RateMeteorological Station+1

0 views

PreviousPage 10 of 129Next