DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Machine Learning Datasets | DataSalon

All Categories

🤖

Machine Learning

General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites

192,143 datasets

Machine Learning

VAT Registered Enterprises by Business Age in England and Wales, 2008-2009

Counts of VAT-registered enterprises categorized by their age of business. The data is sourced from the UK Office for National Statistics Business Registers Unit and covers England and Wales for the years 2008 and 2009. It is published at multiple geographic levels including Middle Layer Super Output Areas, Local Authority Districts, Government Office Regions, and national totals.

TabularGeospatialBusiness DemographicsAdministrative DataUk EconomyEnterprise Age+1

0 views

Machine Learning

VAT Registered Enterprises by Industry in Rural England and Wales, 2005-2007

Office for National Statistics data provides counts of VAT-registered enterprises by broad industry group in areas classified as rural. The dataset covers England and Wales from 2005 to 2007 and is available at multiple geographic levels including MSOA, Local Authority District, and Government Office Region. It originates from the ONS Business Registers Unit and is published under the OGL-UK-3.0 license.

TabularGeospatialBusiness RegistersAdministrative DataIndustry ClassificationRural Economy+1

0 views

Machine Learning

VAT Registered Enterprises by Industry Group in Urban England and Wales, 2005-2007

Office for National Statistics (ONS) Business Registers Unit data provides counts of VAT-registered enterprises by broad industry group in areas classified as urban. The dataset covers England and Wales from 2005 to 2007 and is published by Neighbourhood Statistics. Data is available at multiple geographic levels including Middle Layer Super Output Area (MSOA), Local Authority District (LAD), Government Office Region (GOR), and National.

TabularGeospatialEngland WalesBusiness RegistersEconomic ActivityAdministrative DataUrban Areas+1

0 views

Machine Learning

GitTables Benchmark: Column Type Detection for Semantic Annotation

A benchmark subset of 1,101 tables from the GitTables corpus curated for evaluating column type detection systems. It was created by Madelon Hulsebos of the University of Amsterdam for the SemTab 2021 challenge's CTA task. The dataset provides ground truth annotations linking table columns to semantic types from the DBpedia and Schema.org ontologies.

TabularColumn Type DetectionSemantic AnnotationTabular BenchmarkBenchmarkNatural Language ProcessingOntology Mapping+1

0 views

Machine Learning

MSL Curiosity Rover: Labeled Images of Martian Terrain and Features

6,820 images from the Mars Science Laboratory Curiosity Rover, captured by its Mastcam and MAHLI instruments, are labeled into 19 science and engineering classes. Each image was labeled by three volunteers, with a consensus process used to determine the final class. The dataset is pre-split for machine learning, covering Martian days (sols) 1 through 2224, with all images resized to 227x227 pixels.

ImagePlanetary ImageryMars Curiosity RoverMachine Learning ClassificationBenchmarkComputer VisionAstrogeology+1

0 views

Machine Learning

Wetland Vegetation Map of Narran Lakes Using Multi-Sensor Data

A June 2024 to June 2025 wetland vegetation map for Narran Lakes produced by the NSW Department of Climate Change, Energy, the Environment and Water. It was created using a machine learning classification workflow incorporating Sentinel-1 radar, Sentinel-2 optical, LiDAR, and terrain data. The product serves as a landscape-scale baseline for environmental water planning and conservation management in the Murray-Darling Basin.

Time SeriesGeospatialZIPMachine LearningEnvironmental monitoringWetland VegetationBenchmarkSynthetic+1

0 views

Machine Learning

NESP TWQ Project 1.7: Alluvial Gully Erosion Management Effectiveness, 2015-2016

Alluvial gully erosion contributes 20-40% of fine sediment from the three largest sediment contributing catchments to the Great Barrier Reef. The project evaluates the effectiveness of different management strategies to reduce this erosion, collecting baseline data in the Normanby and Burdekin catchments. Data is hosted by the Australian Ocean Data Network and was last updated in July 2026.

GeospatialGreat Barrier ReefSediment ManagementEnvironmental monitoringAlluvial Gully ErosionBenchmarkCatchment Management+1

0 views

Machine Learning

First Nation Traditional Territories Core Area: KFN-WRFN Overlap Resolution Boundary

A geospatial boundary layer from the Government of Yukon, last updated on 2026-07-01. This dataset defines the boundary that eliminates the overlapping area between KFN and WRFN for the purposes of Settlement Agreements. It is distributed via GeoYukon, Yukon's digital map data collection.

GeospatialZIPYukonFirst NationsTraditional TerritoriesBoundary Resolution+1

0 views

Machine Learning

Ipswich Business Improvement District Boundaries and Phases

From 2012 to 2022, this dataset defines the geographical boundaries for the Ipswich Central Business Improvement District (BID) across its operational phases. It likely contains polygon data representing the area where businesses were subject to an additional rate levy to fund town centre improvements. The data is structured by a 'version' field indicating the active time intervals for the BID's second and third phases.

GeospatialEconomic DevelopmentGeospatial BoundariesTax LevyLocal GovernmentBusiness Improvement District+1

0 views

Machine Learning

eAtlas Legacy WMS: Great Barrier Reef Environmental Maps (2008-2011)

The eAtlas Legacy Web Mapping Service (WMS) delivered approximately 460 geospatial map layers focusing on the Great Barrier Reef. This service was operated by the Australian Institute of Marine Science and co-funded by the MTSRF program from 2008 to 2011, and was decommissioned in January 2024. The majority of layers correspond to Glenn De'ath's interpolated maps of the GBR developed under the MTSRF program.

ImageGeospatialEnvironmental ResearchGreat Barrier ReefMarine ScienceWeb Mapping Service+1

0 views

Machine Learning

Chemprop: Benchmark Data for Molecular Property Prediction

Benchmark Data for Chemprop is a collection of datasets and pre-defined splits from the Chemprop machine learning package for chemical property prediction. It includes data for multiple benchmarking systems, such as HIV replication inhibition, biological activities from PCBA, DFT-calculated quantum properties from QM9, and reaction barrier heights. The collection provides train, validation, and test splits to facilitate the development and evaluation of predictive models in computational chemistry.

TabularGraphMachine LearningBenchmarkChemistryQuantum MechanicsMolecular Property+1

0 views

Machine Learning

UK Local Authority Planned Education Expenditure Benchmarking Tables, 2010-11

Benchmarking Tables of planned expenditure are drawn from the published Children, Schools and Families Financial Data Collection budget statements. The tables give detailed information on each authority's planned expenditure on education in a form which enables comparison between authorities. The data is designated as Official Statistics not designated as National Statistics and is published by the Department for Education.

Tabular🇬🇧 United KingdomBenchmarkingFinanceLocal GovernmentFinancial DataEducation ExpenditureLocal Government Finance+1

0 views

Machine Learning

OSNI 1:1 Million Raster Map of Northern Ireland County Boundaries

A 1:1,000,000 scale raster map provides a visual overview of county boundaries in Northern Ireland. This static image is published as open data by the Ordnance Survey of Northern Ireland (OSNI) and is suitable for use as background mapping in desktop and web applications. The dataset is available under the Open Government Licence and is hosted on multiple government data platforms.

ImageGeospatialJSONRaster MapCountiesNorthern IrelandAdministrative BoundariesAdministrative RegionsGeospatial BoundariesComputer VisionCounty AdministrationBackground Mapping+1

0 views

Machine Learning

OSNI 1:1M Raster: Northern Ireland Infrastructure Overview

OSNI Open Data provides a 1:1,000,000 scale raster map depicting infrastructure across Northern Ireland. This static image is suitable for use as a background layer in desktop and web-based mapping applications. The dataset is the smallest-scale raster product from the Ordnance Survey of Northern Ireland, offering a broad geographical overview.

ImageGeospatialJSONRaster MapNorthern IrelandComputer VisionInfrastructureBackground Mapping+1

0 views

Machine Learning

Mean hydrography on the continental shelf from 26 repeat glider deployments along Southeas

26 glider missions between 2008 and 2015 collected over 33,600 CTD casts on the continental shelf of southeastern Australia. This dataset provides gridded mean fields for temperature, salinity, density, dissolved oxygen, and chlorophyll-a fluorescence. The data is presented by the Australian Ocean Data Network, offering high-resolution observations of shelf waters adjacent to the East Australian Current.

Time SeriesGeospatialOceanographyGlider DeploymentsEast Australian CurrentHydrographyContinental Shelf+1

0 views

Machine Learning

Waste Data Interrogator 2012: UK Facility Waste Flows

Environment Agency's Waste Data Interrogator 2012 contains annual waste quantity and type data from around 6,000 regulated waste management facilities in the UK. Operators report waste received on-site and waste sent onward, supporting compliance monitoring and planning for new facilities. The dataset has been provided in an interrogatable format since 2006, though site details are withheld for operators claiming commercial confidentiality.

TabularZIPExcelRegulated FacilitiesWasteWaste ManagementEnvironmental ComplianceWaste DisposalEnvironmentFacility Operators+1

0 views

Machine Learning

Waste Data Interrogator: UK Facility Waste Flows

The Environment Agency collects annual data from around 6,000 regulated waste management facilities in the UK. This dataset details quantities and types of waste received and sent on from sites, used for compliance monitoring and planning. Data has been provided in an interrogatable format since 2006, though site details are omitted where commercial confidentiality is claimed.

TabularZIPExcelFacility OperationsWasteWaste ManagementEnvironmental ComplianceCompliance MonitoringWaste DisposalEnvironment+1

0 views

Machine Learning

Waste Data Interrogator: UK Facility Waste Flows

Around 6,000 regulated UK waste management facilities report annual data on waste received and transferred since 2006. The Environment Agency uses this data to monitor compliance and assist in planning for new facilities, with some site details withheld for commercial confidentiality. Data is public register information, aggregated for national and local authority use.

TabularZIPExcelFacility OperationsEnvironment AgencyWasteWaste ManagementWaste DisposalEnvironmentRegulatory Compliance+1

0 views

Machine Learning

WFD RBMP2: Protected Area Register and Objectives for English River Basins

English river basin districts and the Severn contain a register of protected areas and their environmental objectives for Cycle 2 of the Water Framework Directive. The dataset includes Welsh Severn data and was published by the Environment Agency. It has been retired and superseded by a newer record.

TabularGeospatialZIPUk EnvironmentWater Framework DirectiveRiver BasinEnvironmental RegulationProtected AreasRiver Basin Management+1

0 views

Machine Learning

WFD RBMP2 Confident Measures: Water Body Status Improvement Predictions for England

England's Environment Agency compiled a list of measures predicted to improve water body status by 2021. The dataset includes only measures with sufficient confidence in their location or scale of improvement to predict specific outcomes. This record was retired and superseded by another dataset on the uk_data platform.

TabularZIPWater managementEnvironmental MeasuresUk DataStatus PredictionRiver Basin+1

0 views

PreviousPage 13 of 9580Next