DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Machine Learning Datasets | DataSalon

All Categories

🤖

Machine Learning

General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites

157,579 datasets

Machine Learning

Constitutional Court Competency Conflicts in Colombia, 2006-2026

From 2006 to April 2026, this database contains all competency conflicts presented to the Constitutional Court of Colombia. The data was last updated on May 4, 2026, and is provided by the platform www.datos.gov.co. It includes columns for case file number, subject matter, date, and case type.

TabularCSVXMLJSONColombiaLegal ConflictsConstitutional LawCourt Cases+1

0 views

Machine Learning

UN Security Council Provisions on Civilian Protection, 1999 Onwards

United Nations Security Council decisions from 1999 onward containing keywords related to the Protection of Civilians. The Security Council Affairs Division created this dashboard as an information resource for the Repertoire of the Practice of the Security Council. The data was last updated on 2026-05-20.

TabularCSVProtection Of CiviliansUnited NationsSecurity CouncilArmed ConflictInternational Law+1

0 views

Machine Learning

CEA Academic Offerings: Aviation Training Programs with Costs and Requirements

Academic program information offered by the Center for Aeronautical Studies of Aerocivil for continuing education. The dataset includes columns for activity name, cost, modality, duration, target audience, objectives, and study plan. It is hosted on the Colombian open data portal www.datos.gov.co and was last updated on May 18, 2026.

TabularCSVXMLJSONAviation EducationColombiaCourse CatalogAcademic Programs+1

0 views

Machine Learning

Annotated Videos of Basket Trap Construction: Tying the First Hoop

A collection of video files with action annotations documenting the initial stage of basket trap construction. The dataset, created by Marie-Annick Moreau, includes footage from carving sticks to tying them onto the top ring. It was last updated on June 3, 2026, and is shared under a CC-BY-NC-SA 4.0 license.

VideoMultimodalCraft VideoAction AnnotationBasket Trap MakingEthnographic Video+1

0 views

Machine Learning

Suburban Train Crowding During 2016 PM Peak Hours in Australia

Infrastructure Australia created this geospatial dataset for the 2019 Australian Infrastructure Audit. It represents average weekday transport crowding performance during the PM peak period from 4pm to 6pm in 2016. The data models strategic transport conditions, excluding network links below daily volume thresholds.

GeospatialPeak HourCrowdingPublic Transport+1

0 views

Machine Learning

Semantic Harmless: Paired Harmful and Harmless Prompts for Controlled Comparison

Semantic Harmless contains one-to-one semantic matches between prompts from two source datasets. The dataset aligns prompts that are semantically closest, where one prompt is harmful and the other is harmless, creating a more controlled comparison. It was created by heretic-org and was last updated on Hugging Face in June 2026.

TextNlp SafetySemantic AlignmentHarmful ContentPrompt Pairs+1

0 views

Machine Learning

Top 20 Menstrual Pain Products by Sales Share from a Retailer, 2006-2015

A list of the top 20 pain products sold by a retailer, which collectively accounted for 53% of total menstrual product sales. The data covers sales between 30th April 2006 and 16th April 2015. It was authored by Victoria Sivill and published on figshare under a CC-BY-4.0 license.

TabularExcelRetail SalesConsumer GoodsMenstrual ProductsPain Products+1

0 views

Machine Learning

Standard Error of Estimate for DAMM Predictions of Fecal SCFA COD

Standard error of estimate (σ_est) for predictions made by the DAMM model regarding fecal short-chain fatty acid chemical oxygen demand. The 5.5 KB XLS file, authored by Taylor L. Davis and last updated in May 2026, quantifies the error for predictions against an identity line where predictions should equal measurements.

TabularExcelFecal AnalysisDamm ModelPrediction ErrorScfa+1

0 views

Machine Learning

Changes in Interest and Access to Opioid Medication Before and After Intervention, N=117

Matthew N. Ponticiello's dataset records changes in interest, perceived difficulty in accessing, and perceived importance of initiating medications for opioid use disorder before and after a brief intervention. The data covers 117 participants on probation with opioid use disorder. It was last updated on 2026-05-27 and is shared under a CC-BY-4.0 license.

TabularExcelOpioid Use DisorderMedication AccessProbationClinical ResearchBehavioral Intervention+1

0 views

Machine Learning

Biochar Production and CO2 Adsorption Data for Machine Learning Modeling

A dataset supporting a machine learning model for engineering porous biochar for CO2 adsorption. The gradient boosting regression model uses biomass composition, pyrolysis, activation, and adsorption conditions as inputs, achieving an R² of 0.99 and RMSE of 0.15. The dataset, created by Chengkai Cao and last updated in May 2026, is provided in an XLSX file.

TabularExcelMachine LearningBiomass PyrolysisCo2 AdsorptionBiochar+1

0 views

Machine Learning

SkillTrustBench Results: AI Model and Tool Safety Evaluation Leaderboard

SkillTrustBench Results stores public leaderboard records for an AI safety benchmark. The dataset tracks two comparison groups: one fixing a model and comparing tools, and another fixing an analysis tool and comparing models. Raw system outputs are normalized into safety categories of normal (safe), suspicious, or malicious.

TabularLeaderboardModel EvaluationSafety ClassificationAi BenchmarkBenchmark+1

0 views

Machine Learning

Randomized Controlled Trial of Azithromycin for Perinatal Infection Prevention

472 pregnant women undergoing labor induction were randomized to receive 2g oral Azithromycin (n=236) or no treatment (n=236) in a single-center trial. The dataset contains primary and secondary outcomes measuring perinatal infection rates, maternal and neonatal complications, delivery mode, and safety parameters. It was authored by huimin Cao and last updated on 2026-05-24.

TabularExcelAntibiotic ProphylaxisPerinatal HealthObstetricsClinical Trials+1

0 views

Machine Learning

UNGRD Index of Classified and Reserved Information, 2023

An index of information from the Colombian National Unit for Disaster Risk Management (UNGRD) that is restricted by law or regulated for specific classes of persons, identified as classified or reserved. The dataset includes 13 columns detailing the legal basis, responsible parties, and duration of classification. It was published on datos.gov.co and last updated on 2026-05-18.

TabularCSVXMLJSONGovernment TransparencyRisk ManagementInformation ClassificationPublic Administration+1

0 views

Machine Learning

Snow Cover Indicators for Canada: Spring Extent, Duration, and Water Equivalent

The Canadian Environmental Sustainability Indicators program provides data tracking terrestrial snow cover over Canada. Indicators include spring snow cover extent, annual snow cover duration, and March snow water equivalent, with data presented in maps, charts, and CSV tables. The dataset is produced by Environment and Climate Change Canada and was last updated on 2026-04-23.

TabularTime SeriesGeospatial🇨🇦 CanadaCSVEnvironmental IndicatorsClimate ChangeSnow Cover+1

0 views

Machine Learning

Prox-E ShapeTalk Benchmark: 600 3D Shape Editing Samples

Prox-E ShapeTalk Benchmark is a subset of 600 random samples from the ShapeTalk dataset, curated for evaluating 3D shape editing models. It is the official benchmark for the SIGGRAPH'26 paper 'Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions'. The dataset was created by author 'haopt' and last updated on May 30, 2026.

Point CloudMachine Learning BenchmarkBenchmarkShape EditingComputer Graphics+1

0 views

Machine Learning

GOES 13 Satellite Overshooting Top Data for Hurricane Research

Produced during the GRIP Field Experiment, this dataset contains satellite-derived overshooting top magnitudes for tropical storms and hurricanes. It was created by NASA for use with the Real Time Mission Monitor tool to study storm formation and intensification. The data is visualized as color-coded overlays in Google Earth.

GeospatialSatellite ImageryTropical StormsEarth ScienceCloud PropertiesHurricane Research+1

0 views

Machine Learning

Public Wi-Fi Session and User Data for San José de Cúcuta, 2021-2022

San José de Cúcuta municipality provides data on free public Wi-Fi zone usage from 2021 to 2022. The dataset likely contains session counts and user demographics, including gender, age, device type, and operating system. It originates from the Colombian open data portal www.datos.gov.co and was last updated in May 2026.

TabularTime SeriesCSVXMLJSONPublic WifiDevice UsageDemographicsMunicipal Data+1

0 views

Machine Learning

Minidisk infiltrometer timeseries from nine burned areas in Northern California and Nevada

Northern California and Nevada are the geographic scope for this data release. It contains raw timeseries and metadata for 827 minidisk infiltrometer measurements conducted across nine burned areas and nearby unburned areas. Scott McCoy authored the dataset, which covers measurements from 2018 to 2023.

TabularTime SeriesGeospatialCSVExcelHydrologyWildfire ImpactSoil Infiltration+1

0 views

Machine Learning

Global Lightning Flash Rate Monthly Climatology Time Series

1995 to 2014 monthly gridded climatologies of total lightning flash rates derived from two satellite-based sensors, the Optical Transient Detector (OTD) and Lightning Imaging Sensor (LIS). The dataset provides a merged, long-term record, with robust tropical and subtropical coverage from LIS and high-latitude data from OTD. It is produced by the National Aeronautics and Space Administration and is available in formats including BIN, ISO, HTML, and PDF.

Time SeriesGeospatialTropical MeteorologyLightning ClimatologyAtmospheric ElectricitySatellite Observations+1

0 views

Machine Learning

Vaani Benchmark V1.0: Hindi Speech Recognition with 5,343 Audio Segments

ARTPARK-IISc's Vaani Benchmark V1.0 is a curated Hindi automatic speech recognition (ASR) evaluation set. It contains 5,343 audio segments from 1,103 speakers across 104 Indian districts, totaling approximately 11.7 hours. Each audio segment includes three independent human transcriptions.

AudioMultilingualHindiBenchmarkAudio TranscriptionSpeech Recognition+1

0 views

PreviousPage 123 of 7863Next