Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
165,176 datasets
South Sudan data from a 2023 Forced Displacement Survey of 3,055 adults, including 2,066 refugees and 989 host community members. The dataset was analyzed by Hyojin Im using machine learning models to examine structural predictors of depressive symptom severity, measured by the PHQ-9. The study was published on figshare in 2026.
MASIE-NH provides daily measurements of sea ice extent and edge boundaries for the Northern Hemisphere and 16 specific Arctic regions. The dataset is derived from U.S. National Ice Center (USNIC) source data and is presented in polar stereographic projections at 1 km and 4 km grid resolutions. It includes numeric extent values, time series plots, and image files for visual analysis.
A dataset from the Australian Ocean Data Network, last updated in June 2026, describes reef talus deposits in the Gulf of Carpentaria and Arafura Sea. The data is attributed to tropical cyclones and is used to infer net sediment transport pathways on the Holocene continental shelf. The orientation of these deposits suggests a consistent, along-coast transport pathway.
MTA Metro-North Railroad records detail causes of train delays, cancellations, and service changes from 2012 onward. The dataset includes specific delay categories, train numbers, branch lines, and station-level arrival and departure information. It tracks performance metrics for commuter rail service in the New York region.
Monthly consolidated records of Petitions, Complaints, Claims, and Suggestions managed by a public entity. The data includes metrics on response timeliness and request types, sourced from the INFIVALLE open data portal. It was last updated on 2026-05-18.
Bi-weekly averaged temperature measurements from 21 shallow borehole thermistors in Ilulissat, Greenland, recorded from November 1968 to June 1982. The dataset also includes concurrent measurements of snow depth, snow extent, and surface air temperature. It is provided by NASA in a tab-delimited ASCII text format.
Attendance data for national parks and historic sites managed by Parks Canada for the 2025-26 period. The dataset is provided by the Government of Canada under the OGL-CA-2.0 license and was last updated in June 2026. The specific columns and row count are not detailed in the available metadata.
331 paleomagnetic specimens from 24 sites in the Early Jurassic Granite Mountain batholith (~600 km²) are analyzed. Aluminum-in-hornblende geobarometry at 10 sites defines emplacement depths of ~16-19 km. The data, published by the Government of Yukon, suggests minimal tectonic motion of the Yukon-Tanana Terrane since the Early Jurassic.
The dataset contains interdisciplinary measurements from the TRATLEQ1 cruise, focusing on upwelling in the tropical Atlantic. It is the first program to cover a complete equatorial section from east to west and surface to bottom, collecting physical, chemical, biogeochemical, and biological data. The cruise contributed to multiple international research projects including GEOMAR OCEANS, EU TRIATLAS, and BMBF SPACES.
Glaive Function Calling v2 converted into the ms-swift Agent format for supervised fine-tuning. The dataset contains approximately 109,000 JSONL entries for training AI agents to use tools. It was created by author hhzhou and last updated on June 21, 2026.
Northern Yukon's Wernecke Mountains expose the Proterozoic Pinguicula Group, a succession of clastic and carbonate rocks. The strata were deposited after the Racklan orogeny and Hart River sill emplacement, with contact relationships clarified during a 2009 field season. The Government of Yukon published this geological study, raising questions about the group's age and correlation with the Fifteenmile Group.
The glacial history and placer gold potential dataset from the Government of Yukon provides reconstructions and geomorphic mapping for the North McQuesten River, Dublin Gulch, and Keno Hill map areas. It details a succession of glaciations, including pre-Reid, Reid, and McConnell episodes, and analyzes placer potential based on geomorphology, glacial history, geochemistry, bedrock geology, and historic records. The dataset was last updated on April 17, 2026.
MTA Bridges & Tunnels safety indicators track preventative measures and incident occurrences. The data is provided by data.ny.gov and includes monthly measurements for various metrics against targets. The dataset was last updated in May 2026.
Scherm On-Premise LLM Inference Benchmark v0.5.1 provides performance data for large language models across 9 real GPUs, from the NVIDIA B200 to older consumer cards like the GTX 1080 Ti. The benchmark includes metrics like throughput (tok/s), VRAM usage, and tensor-parallel scaling, measured with a methodology using a seed of 1234, 10 repetitions per point, and input/output lengths of 512 and 256 tokens. It was created by Scherm-AI and last updated on 2026-06-16.
LeRobot was used to create this dataset, which likely contains teleoperation records for a robotic arm. The dataset features include action data with six joint positions for a manipulator. It was authored by ohdoking and uploaded to Hugging Face on June 20, 2026.
1.24-meter spatial resolution multispectral imagery was collected by the WorldView-4 satellite across the global land surface from December 2016 to January 2019. The data contains four spectral bands—blue, green, red, and near-infrared—and is provided in NITF and GeoTIFF formats as sensor-corrected Level 1B products. Its high temporal resolution of approximately 1.1 days supports detailed monitoring of land surface changes.
A comparison of shared and co-aperture antenna designs created by Abdul Rehman Chishti and published in 2026. The dataset is 5.5 KB in size and focuses on application, size, and gain parameters. It is available under a CC-BY-4.0 license.
9.5 KB of computed hemodynamic parameters under four stenosis severities (30%, 50%, 70%, and 90%) and three blood viscosity conditions (below-normal, normal, and high). The dataset was authored by Lei Zhengyao and last updated on 2026-05-28.
17.4 KB of data from a mixed-effects analysis investigating the link between pre-competition strength metrics and sprint canoe/kayak performance. The dataset was authored by Zongwei Chen and last updated on May 28, 2026. It likely contains measurements from professional Chinese athletes.
17.4 KB of data from a mixed-effects analysis investigating the link between pre-competition strength metrics and sprint canoe/kayak performance. The dataset was authored by Zongwei Chen and last updated on May 28, 2026. It likely contains measurements from professional Chinese athletes.