Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
165,940 datasets
SORCE SIM Level 3 Solar Spectral Irradiance Daily Means V027 (SOR3SIMD) provides merged daily solar spectra from the SIM instrument. The dataset covers wavelengths from 240 to 2416 nm with a spectral resolution of 1 to 27 nm and reports irradiances at 1 AU with an absolute uncertainty of about 2%. Data is structured as a tabular ASCII text file with rows for each wavelength per day.
A cleaned and restructured list of state suppliers for the Antioquia department in Colombia. The dataset was published by the Medellín Chamber of Commerce and includes master data management and quality treatment of all fields. It was last updated on 2026-05-18.
MLS/Aura Level 3 monthly binned hydroxyl (OH) mixing ratio data derived from THz radiometer measurements. The dataset provides near-global coverage from August 2004 to December 2009, with intermittent data through 2014, on assorted vertical grids from 31.6 to 0.00316 hPa. It is produced by NASA and archived in netCDF4 format.
LittoMOS is a harmonized land cover layer for the French coast, created by the Bureau de Recherches Géologiques et Minières (BRGM). It reconstructs and updates the national Permanent Littoral Inventory (IPLI) using data from 2000 to 2006. The dataset provides land cover information at two levels of detail based on an adaptation of the Corine Land Cover (CLC) nomenclature.
Near-global hydroxyl (OH) mixing ratio data from the NASA Aura Microwave Limb Sounder (MLS) instrument, binned monthly on various vertical grids. The dataset provides continuous coverage from August 2005 to December 2009, with intermittent data collected for about 30 days each in August/September of 2011-2014. Data files are archived in netCDF4 format and include profile and column grid objects with geolocation fields and metadata.
SORCE SOLSTICE MUV Level 3 Solar Spectral Irradiance Daily Means V018 (SOR3SOLMUVD) is the final version of a NASA data product providing daily solar spectra. It contains merged daily measurements from the SOLSTICE MUV instrument across a spectral range of 180 to 310 nm at 1 nm resolution, with reported irradiances normalized to 1 AU. The data is structured as a tabular ASCII file with columns for date, wavelength bounds, irradiance value, uncertainty, and quality flags.
Tian Siqi's research data, published on figshare in May 2026, demonstrates how inosine monophosphate (IMP) enhances the killing efficacy of aminoglycoside antibiotics against common aquaculture pathogens like Edwardsiella tarda and Vibrio spp. The dataset, 265.4 KB in size, includes files in PDF and XLSX formats. It likely contains experimental results supporting the proposed mechanism where IMP reshapes central carbon metabolism to increase ATP levels, amplifying energy-dependent proteotoxic stress.
CentificAIResearch's benchmark evaluates the safety and robustness of email classification systems under adversarial conditions. It consists of two complementary datasets designed to assess model classification accuracy and the reliability of LLM-based graders. The dataset was last updated on June 22, -2026.
Lawrence O. Barros published mean follicle interval lengths, diameters, and growth rates for plains zebra mares. The 9.5 KB Excel file contains data from prostaglandin administration to Day –1. It was last updated on figshare in May 2026.
Lawrence O. Barros published a dataset containing mean measurements for follicle intervals, diameters, growth rates, daily counts, and wave numbers during the interovulatory interval in plains zebra mares. The dataset is stored as an XLS file sized 9.5 KB and was last updated on 2026-05-22. It is licensed under CC-BY-4.0.
Training events offered by the Escuela Superior de Administración Pública - ESAP (Technical Directorate of Training and the School of High Government). The dataset includes 17 columns such as event name, location, dates, and participant counts. It was last updated on 2026-05-18 and is hosted on the Colombian open data portal www.datos.gov.co.
A 70.7 KB dataset used to develop hybrid forecasting algorithms for Emergency Department patient arrivals. The data likely contains daily arrival counts influenced by meteorological and calendar factors, as described in a study by Hamed Tabesh. The dataset was last updated on April 30, 2026, and includes performance metrics for ARIMA, ANN, LSTM, GLM, and two hybrid models.
Kuairand Recexplain SFT/DPO Data provides derived post-training data and Semantic ID assets for fine-tuning a large language model for recommendation explanation tasks. The dataset is associated with the LoRA adapter plumliu/qwen35-9b-kuairand-recexplain-lora and was last updated on June 22, 2026. Author plumliu created this resource to support work on explainable recommendation systems.
Montgomery County of Maryland's dataset documents instances of force used by police officers on subjects and force used by subjects on officers. The data includes detailed columns on event classification, officer and subject actions, weapons used, subject demographics, and medical outcomes. It is updated daily and available across multiple government data platforms.
Priority Living Areas (PLA) as identified within seven Queensland regional plans, including Central Queensland and South East Queensland. The dataset is published by the Queensland State Department of State Development, Infrastructure and Planning and was last updated in May 2026. It is available in multiple geospatial formats such as SHP, GPKG, and KMZ.
Gazetted administrative boundaries for regional planning areas as determined by the Queensland Department of State Development, Infrastructure and Planning. The dataset was last updated on 2026-05-18 and is provided by the Queensland Government.
A geospatial dataset grouping footpaths and trail rights-of-way within the city of Saint-Hyacinthe. The data was created through technical drawing and manual integration from construction plans or orthophotos, and is provided by the Government and Municipalities of Québec. It was last updated on 2026-04-22.
Zhuoqi Zheng published a dataset on figshare in May 2026. The 9.5 KB XLS file contains ranking results for different methods, likely using the TOPSIS multi-criteria decision-making technique. The data shows mean values for four metrics and calculated relative nearness values, where higher values indicate better performance.
Initial adjustable analysis usage options and associated definitions for the Rosetta-Routine. The dataset provides multiple adjustable usage options covering a wide range of processes to enable adaptable analysis approaches in cluster data processing. It was authored by Bradley Mason and last updated in May 2026.
5.5 KB of wear analysis scores for feet, described by Michael A. Berthaume. The data is stored in an XLS file and was last updated on 2026-05-29. The description notes that once a foot has been used, no cells can receive a score of '0'.