Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
166,541 datasets
650 relational databases spanning academic, e-commerce, finance, sports, biomedical, and government domains are ported to a self-describing manifest format. The collection is built by star-project for large-scale pretraining of relational and tabular foundation models, with tasks shipping labels as-is. The dataset was last updated on June 12,我们发现了一个问题,输入中的最后更新日期是2026年,这是一个未来的日期,这可能是一个错误或占位符。根据事实性协议,我们直接陈述这个数字,但不在推断中使用它来暗示新鲜度。
2020–2022 data on the sales-weighted distribution of cigarette prices. The dataset is a 5.5 KB Excel file authored by Mirjana Čizmović and shared under a CC-BY-4.0 license. It was last updated on June 2, 2026.
162 administrative regions are distinguished in this vector map of the Former Soviet Union's land area. The data set was derived from 1:3 million scale administrative boundaries published by ESRI in 1998. It provides a foundational geospatial layer for historical and regional analysis of the FSU.
Sports projects operated by the Pereira Mayor's Office through its Sports Secretariat across the city's neighborhoods and rural districts. The dataset includes project names, responsible organizations, addresses, services offered, and monthly beneficiary counts. It is published by www.datos.gov.co and was last updated on 2026-05-18.
A 2026 dataset by A. K. Singh contains PCA scores and cluster analysis results for 101 bael (Aegle marmelos) genotypes. The data likely includes scores from the first six principal components, which collectively explain 80.77% of total variability, and identifies superior genotypes CHESB-25 and CHESB-29. The dataset is intended to support breeding programs for developing higher-yield and better-quality cultivars.
Unidades de las Áreas Coralinas (Polígonos) provides the location and classification of Colombia's coral reef areas identified up to 2020. The data includes biotic, geomorphological, and ecological units for use in the "Atlas digital de las Áreas Coralinas de Colombia". It was last updated on 2026-05-18 and is hosted on the Socrata platform via www.datos.gov.co.
A study of 101 bael (Aegle marmelos) germplasms assessed genetic variability based on morphological and qualitative traits. The dataset includes measurements for traits like shell weight, fruit weight, and pulp weight, with heritability estimates ranging from 0.07% to 92.23%. Author A. K. Singh published the data on figshare under a CC-BY-4.0 license.
Southeast Australia's continental margin contains submarine canyons. The dataset is published by the Australian Ocean Data Network on data_gov_au and was last updated on 2026-06-16. It is a legacy product for which no abstract is available.
Individual Monthly Hillslope Cover Erosion (t.ha-1.month-1) over New South Wales for 2002. The dataset is provided by the NSW Department of Climate Change, Energy, the Environment and Water and was last updated on 2026-05-18. Data files are available in PDF and GEOTIFF formats under a CC-BY-4.0 license.
Model evaluation results across five-fold cross-validation. The dataset is a 5.5 KB XLS file authored by Ruping Zhang and last updated on June 1, 2026. It is shared under a CC-BY-4.0 license on the figshare platform.
A Permian cold water marine fauna in the Grant Formation of the Canning Basin, Western Australia is a legacy dataset published via data_gov_au. The dataset is hosted by the Australian Ocean Data Network and was last updated on 2026-06-16. No abstract or detailed metadata is available.
A policy document outlining the rationale and design for a high-quality regional public transport network called Kolibri within the Groningen-Assen National Urban Network. The document, published by the Dutch Ministry of the Interior and Kingdom Relations, addresses urban accessibility challenges, citing 125,000 jobs in Groningen and 160,000 daily regional commuters, 74% of whom travel by car. It proposes the Kolibri network as a solution to prevent economic blockage and support the Regional Vision Groningen-Assen 2030.
WanderDream is a benchmark dataset for evaluating situated spatial reasoning without active exploration. It was created by author lrp123 and is hosted on Hugging Face, with a last recorded update on 2026-06-25. The dataset challenges models to reason about future visual scenes and obstacles based on a starting panoramic view and a natural-language target description.
Preliminary data from the current period, last updated on 2026-05-18, contains information on vehicles immobilized for violating traffic and transport rules in the Barranquilla district. The dataset is provided by www.datos.gov.co and includes details on infractions, vehicle types, and service types. The information is subject to change.
Open, citable World Cup 2026 fixture data converted for South African Standard Time. The dataset is provided by punts za and was last updated on 2026-05-31. It is intended for research purposes only, with no odds, picks, betting predictions, or affiliate links.
Numerical dynamo simulation data accompanies a manuscript submitted to Communications Physics. The dataset is 40.5 MB in size and was authored by Chi Yan. It was last updated on May 28, 2026.
Hongtao Zhang published a dataset comparing prediction results from an independent validation set with finite element results. The dataset is a 5.5 KB Excel file available under a CC-BY-4.0 license. It was last updated on June 1, 2026.
CONSEJOS COMUNITARIOS DE MUJERES lists members of women's community councils across municipalities in the Risaralda department. The dataset includes columns for member name, residential zone (rural or urban), municipality, email, organization, and year. It was published by www.datos.gov.co and last updated on 2026-05-18.
Beginning in 2015, this dataset tracks the reliability of New York City's MTA bus fleet. It reports the Mean Distance Between Service Interruptions (MDBSI), monthly mileage, and road call counts, aggregated by borough. The data is published by data.ny.gov and was last updated on 2026-05-15.
The Port Curtis Integrated Monitoring Program (PCIMP) collected this sediment data set via deployed sensors. It covers Zone 06a in the lower Calliope Estuary in Australia. Data collection occurred between December 2006 and June 2014.