Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
141,962 datasets
NASA's Crustal Dynamics Data Information System archives daily broadcast ephemeris files from a global network of ground receivers tracking multiple satellite constellations. Since 2011, the archive has expanded beyond GPS and GLONASS to include Europe's Galileo, China's Beidou, and other global navigation systems. Each daily file contains broadcast navigation data in the standard RINEX format for a single ground station.
Daily files of 30-second sampled Global Navigation Satellite System (GNSS) observation summary data are provided by NASA's Crustal Dynamics Data Information System (CDDIS). The archive includes data from GPS, GLONASS, and, since 2011, other global and regional systems like Galileo and Beidou, collected from a worldwide network of ground-based receivers. Data is stored in RINEX format, processed through the TEQC (translation, editing, and quality check) software.
A 27-year time series from 1984 to 2010 provides annual land cover classifications for 80% of Rondonia, Brazil. The data consists of 27 GeoTIFF images derived from Landsat TM and MSS sensors, mosaicked from seven path/row scenes. Each image is classified into seven land-cover classes, including primary forest, secondary forest, and pasture.
A 157.4% increase in the mean regional seismic resilience index from 0.115 to 0.296 is measured for 12 western Chinese provinces between 2000 and 2024. Bowen Tang constructed this dataset using an evaluation index system covering economic, population, infrastructure, and governance dimensions, applying entropy weighting and machine learning methods. The data reveals widening interprovincial disparities and a spatial pattern of higher resilience in the southwest and lower in the northwest.
Jamie Davis created a technical specification and benchmarking suite for embedded systems, last updated on June 3, 2026. The 2.4 KB text file describes the Davis Logic V2 framework, a unified bare-metal fixed-point super-core deployment strategy. It includes latency tracking, memory footprint analysis, and throughput specifications for hardware targets like ARM Cortex-M and RISC-V cores.
5.5 KB of evaluation results for five dimensionality reduction methods, created by Uma Shashi Sharma and last updated in June 2026. The dataset compares methods using combined local and global structure preservation scores, as well as biologically relevant relationships based on expert annotations. It includes a final column reporting average overlap across expert-label class pairs.
Colombian data on active nonprofit entities in the municipality of Armenia, managed by the Chamber of Commerce under Decree 1074 of 2015. The dataset includes columns for legal representative, registration number, contact details, and activity codes. It was last updated on 2026-05-18.
Two distinct sets of beach ridges document coastal evolution on the western side of Cape York Peninsula. The dataset, hosted by the Australian Ocean Data Network, describes Pleistocene and Holocene ridge systems, including their composition, sequence, and a key developmental date around 3000 years B.P. File formats include PDF and HTML documents, with a last recorded update in June 2026.
A 17.0 MB document uploaded by Yingkai Zhang on figshare in 2026 describes a study on Modified Guilu Erxian Glue (MGEG) for treating aplastic anemia. The research uses network pharmacology, mass cytometry (CyTOF), flow cytometry, and Western blot to investigate MGEG's immunomodulatory effects via the miR-146a/STAT1/SOCS1 axis. The dataset is licensed under CC-BY-4.0.
126 rice genotypes were evaluated for aroma using sensory analysis and targeted quantification of 164 volatile compounds. The dataset, authored by Heena Rani and last updated in June 2026, presents a rapid phenotyping framework for breeding programs. It integrates sensory and chemical data to resolve five distinct aroma classes, including popcorn-dominant and fruity-floral phenotypes.
An 18.46% reduction in average picking distance was achieved through a data-driven warehouse optimization study for a textile supplier. The research analyzes 4000 fabric SKUs from F布行 in Zhili Town, using EIQ-ABC and K-Means clustering to design a dynamic storage allocation scheme based on sales heat and seasonal patterns. The dataset, last updated in May 2026 and shared under a CC-BY-4.0 license, includes supporting files in PNG, DOCX, and XLSX formats.
New York State, excluding New York City, maintains a registry of individuals certified to operate cranes for construction, demolition, and excavation. The data includes certificate details, operator classifications, and status, serving as a state-mandated license that aligns with OSHA standards. Columns suggest the dataset tracks individual certifications over time, including issuance and expiration dates.
Data from a study assessing the cross-sectional area, oxidative capacity, and capillary supply of individual muscle fibres from mice, recreationally active humans, and highly resistance-trained men. The dataset, 1.8 MB in size, was authored by Hans Degens and last updated on 2026-05-28. It is shared under a CC-BY-4.0 license on figshare.
STGAD inference throughput and latency data is a small dataset of 5.5 KB in XLS format. It was authored by Xiao Liao and last updated on May 21, 2026. The data likely contains performance metrics for a dual-score generative-adversarial framework for anomaly detection in multivariate time series.
STGAD is a dual-score generative-adversarial framework for unsupervised anomaly detection in multivariate time series. The 5.5 KB XLS file, uploaded by Xiao Liao on May 21, 2026, contains results comparing the model's saliency-local overlap metric against a baseline. Experiments were conducted on five benchmark datasets covering server monitoring, aerospace telemetry, industrial control, and ECG signals.
STGAD is a dual-score generative-adversarial framework for anomaly detection in multivariate time series. Xiao Liao published benchmark results on figshare in May 2026, showing training times for 5 epochs across five datasets. The 5.5 KB Excel file likely contains performance metrics for models tested on server monitoring, aerospace telemetry, industrial control, and ECG signals.
5.5 KB of Excel data compares the STGAD anomaly detection framework's performance against a baseline method. The dataset likely contains saliency overlap metrics for multivariate time series across five benchmark domains. It was authored by Xiao Liao and last updated on figshare in May 2026.
Xunhao Wang's dataset contains 339 data points compiled from 101 peer-reviewed studies to predict the removal efficiency of pharmaceuticals and personal care products (PPCPs) by photocatalytic membranes. The data covers 74 PPCPs, 6 photocatalyst categories, and 9 membrane materials. The gradient boosting regression tree model trained on this data achieved an R² of 0.645.
A geospatial dataset from the United Nations Satellite Centre (UNOSAT) mapping fire hotspot density detected by VIIRS sensors between August 9 and 16, 2021. It reports 1,502 hotspots in Bejaia and 1,733 in Tizi Ouzou, with an estimated 210,000 people living near fire-affected areas. This preliminary analysis from August 2021 has not been field-validated.
A systematic review and meta-analysis of 12 studies involving 33,366 participants, published on figshare in 2026. The document synthesizes the performance of machine learning models for predicting non-suicidal self-injury, reporting pooled metrics such as area under the curve, sensitivity, and specificity. It was authored by Qianhui Wen and is licensed under CC-BY-4.0.