Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
153,444 datasets
A prognostic model for lung adenocarcinoma (LUAD) was constructed using hypoxia- and lactylation-related genes via LASSO, XGBoost, and Random Forest algorithms. The model's core gene, PABPC1, was validated experimentally in two LUAD cell lines using qRT-PCR, CCK-8, colony formation, wound healing, and Transwell assays. The dataset, authored by Guannan Wang and last updated in May 2026, is shared under a CC-BY-4.0 license.
José Alonso Solís-Lemus published a dataset containing Spearman correlation coefficients between geometric variables and simulation outputs, with associated p-values and significance levels. The dataset is 2.3 KB in size and was last updated on June 2, 2026. It is available under a CC-BY-4.0 license.
Data extracted from 24 research articles for in-distribution and 4 articles for out-of-distribution evaluation. The dataset was created by Shashank Mishra using manual extraction tools like WebPlotDigitizer and was last updated on 2026-05-02. Its public availability is intended to aid in modelling and designing next-generation triboelectric nanogenerators (TENGs).
A curated collection of text data for training large language models, created by the organization OpenLLM-France. The dataset was last updated on June 3, 2026. Its specific composition, size, and license are not detailed in the provided metadata.
Expenses for the Colorado Department of Transportation for the current and previous state fiscal year. The dataset includes columns such as Name, Clearing Date, Expense Description, Funding Source, CDOT Segment, and Amount. It is published by data.colorado.gov and was last updated on 2026-05-29.
South-East Nigeria is the geographic scope for this dataset from a stepped wedge cluster randomised trial. The study aimed to improve leprosy ulcer management through a community self-care intervention. The data was authored by Anthony Meka and last updated on the platform in June 2026.
Two field campaigns in Europe and North America collected data on snow depth, density, and water equivalent using 9 common snow core samplers. The study, led by Ignacio Lopez Moreno of the National Institute of Ecology, quantifies instrumental bias and observer-induced error in manual snow measurements. Results show uncertainty in bulk snow density estimation is about 5% for an individual instrument and close to 10% among different instruments.
L&I Intent Project Details records intents filed by employers or contractors for work on public works projects in Washington State. The dataset likely contains detailed information on project contracts, involved companies, and key dates. It is hosted on multiple platforms, including data.wa.gov and Data.gov, indicating its status as an official government data release.
Washington State's Labor & Industries department provides daily-updated records of Affidavits of Wages Paid filed by contractors for public works projects. The dataset includes project details, contractor and agency information, contract amounts, and apprentice utilization rates. It supports compliance monitoring and analysis of labor standards on state-funded construction.
Global Affairs Canada publishes quarterly reports on the issuance of import permits under the Comprehensive and Progressive Agreement for Trans-Pacific Partnership (CPTPP) for Turkey. The data is updated quarterly, with the last recorded update on 2026-06-10. The dataset is released under the OGL-CA-2.0 license.
Data.ny.gov provides data on scholarship awards administered by the New York State Higher Education Services Corporation (HESC). The dataset includes the number of recipients and total dollar amounts by college, beginning with the 2009 academic year. It covers scholarships administered by HESC, organized by TAP college codes and sectors.
A report on ambulance quantities categorized by type across municipalities and territorial entities in Colombia. The data includes columns for basic and medicalized ambulance counts, municipality and department names, a total general figure, a report date, and a source field. The dataset was last updated on 2026-05-18 and is hosted by the Colombian open data portal, www.datos.gov.co.
A dataset from www.datos.gov.co, last updated on 2026-05-18, containing records of user feedback for public services. It likely contains information on Petitions, Complaints, Claims, Suggestions, and Compliments (PQRSF) processed through the User Information and Attention System (SIAU). The data includes columns for AREA, PQRFS type, MOTIVO (reason), PROCEDENCIA (origin), NRO PQRFS (case number), and FECHA (date).
Daily updated records from Connecticut's eLicensing system, tracking the status of professional and business credentials. The dataset includes details on credential holders, issue and expiration dates, business affiliations, and license statuses. Columns such as CredentialType, CredentialSubCategory, and StatusReason provide granularity on the type and administrative state of each license.
Fortnightly-updated categories of waste that licensed businesses in the Australian Capital Territory are permitted to handle. This dataset details the waste classifications set by the Waste Management and Resource Recovery (Waste Categories) Determination 2024. It is published by the ACT Government Open Data team and was last updated on May 12, 2026.
MYD03 provides the precise geolocation data for every 1 km sample collected by the MODIS instrument aboard NASA's Aqua satellite daily. Each 5-minute swath file is approximately 30 MB, contributing to a daily volume of about 8 GB. This foundational dataset is produced by the MODIS Science Team and is a critical input for numerous higher-level land and ocean products.
Weekly updated records of alcohol licenses across Missouri counties track the status and details of over 23,000 active and expired permits. This dataset provides a detailed view of licensees, managers, and associated fees, compiled by county clerks and stored with a three-week rolling window. Columns suggest it supports compliance monitoring, market analysis, and business verification for the state's regulated alcohol industry.
Mahamudul Hasan's dataset supports a 2026 study on a unified AI-driven predictive maintenance framework. It contains two processed datasets: 19,535 real-world OBD-II automotive engine sensor observations and the NASA C-MAPSS FD001 turbofan engine dataset labeled for remaining useful life. The repository includes model outputs, feature importance values, and threshold optimization results to enable replication of the research.
The Topographic Map 1:25 000 (DTK25) is an official topographical map series for Germany with scale-related completeness and accuracy. It is provided by the Bundesamt für Kartographie und Geodäsie via an INSPIRE download service, allowing tile-based downloads in 1,806 individual files. The dataset is available under a CC-BY-4.0 license.
Active licensed retailers selling New York state lottery products are listed with business names and addresses. The dataset includes geospatial coordinates and indicates which locations offer the Quick Draw game. Columns suggest integration with New York state geographic boundaries and census data for potential spatial analysis.