Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
147,202 datasets
Yalin Zhu's retrospective study developed an interpretable gradient boosting decision tree model to predict central lymph node metastasis in papillary thyroid carcinoma. The model was trained on data from 710 patients (971 lesions) and achieved an AUC of 0.830 in the test set. It was last updated on 2026-04-27.
Yalin Zhu's study developed an interpretable machine learning model for preoperative prediction of central lymph node metastasis in patients with cN0 T1βT2 papillary thyroid carcinoma. The model was trained on data from 710 patients (971 lesions), integrating pathological, ultrasound, thyroid function, and systemic inflammatory indicators. It achieved an AUC of 0.830 in the test set and was validated on an independent temporal cohort of 50 patients.
Phylogenetic matrices derived from plastomes of 113 members of the Orobanchaceae plant family. The dataset was created by Tang Yu-Hao and last updated on June 2, 2026. It was used to assess the effects of rate heterogeneity, alignment, and missing data on phylogenetic inference.
Demolition Permit applications from Montgomery County, Maryland, updated daily. The data includes detailed permit status, work descriptions, and precise location information across multiple geographic hierarchies. Columns suggest tracking of the permit lifecycle from application to finalization.
Monthly management information on staff numbers and paybill costs across UK Civil Service departments, agencies, and executive NDPBs. The data includes payroll and non-payroll (contingent labour) figures, split between full-time equivalents and headcount, with payroll costs broken into salaries, allowances, and pension contributions. Staffing numbers are recorded as of the last day of each month, while cost information is for the reference month, with a baseline from the 2010/11 financial year.
A replication package for a controlled experiment evaluating Augmented Reality's impact on student understanding of Model Needs in AI Requirements Engineering. It includes anonymized participant responses, accuracy scores, qualitative feedback, analysis scripts, experimental materials, and the mobile application. The dataset was created by Fabiann Barbosa and is available under a CC-BY-4.0 license.
Marie-Annick Moreau uploaded an audio recording titled 'Explanation of the 'Ng'ongole' song' to figshare on June 3, 2026. The 26.9 MB WAV file contains women explaining the meaning of a song that asks God to bring peace to politicians from the President to local leaders, so they have faith in MAM's intentions and allow her to come to Tanzania. The dataset is licensed under CC-BY-NC-SA-4.0.
A 43.8 KB PDF document provides a cultural explanation of a specific song. Marie-Annick Moreau authored this record, which was last updated on June 3, 2026. The description details a song performed by women, interpreted as a prayer for peace among politicians to enable a figure named MAM to visit Tanzania.
A qualitative interview transcript from a group sitting near a pond. Marie-Annick Moreau authored this 56.6 KB PDF document, which was last updated on June 3, 2026. The content describes the initial steps of setting up a fence using stakes and explains the meaning of a prayer and opening ritual.
39.6 KB of interview data in EAF format, documenting the initial steps of setting up a fence. Marie-Annick Moreau authored this dataset, which was last updated on 2026-06 03 12:23:07. The description indicates the content includes an interview with Abdalah Saidi Mwingo describing stake use and Lumolumo explaining a prayer and opening ritual.
Digital echo sounding, SeaBeam swath bathymetry data and sediment cores were collected on the continental slope off southeastern Tasmania. The data was gathered to study sedimentary processes in the vicinity of an ocean disposal site. It is hosted by the Australian Ocean Data Network and was last updated on 2026-06-04.
Colombian public entities obligated to make parafiscal contributions to the ESAP, as stipulated by Law 21 of 1982. The dataset includes entity names, tax IDs, and geographic codes. It is hosted on the Colombian open data portal and was last updated on 2026-05-18.
Yuxi Lin's dataset contains 31 files of raw data and scripts supporting a 2026 research article on gut microbiota metabolites in allergic rhinitis. The 92.7 MB repository includes GWAS summary data, transcriptomic profiles from GEO datasets GSE261239 and GSE43523, and 3D structural data for molecular docking. It provides the complete analytical pipeline executed in R version 4.2.2.
Continental U.S. and global datasets provide multi-decade, calibrated geodetic Earth Science Data Records. The collection includes continuous high-rate GNSS, seismogeodetic, and meteorological time series, a catalog of transient tectonic deformation, and grids of total water storage change derived from GNSS data. These products are generated by the Scripps Institution of Oceanography and NASA's Jet Propulsion Laboratory under the MEaSUREs program.
Registros 2 contains cadastral property records from the municipality of Piedecuesta, Colombia, published on datos.gov.co. The dataset includes property characteristics such as number of rooms, bathrooms, floors, economic stratum, use, cadastral score, and land and built area. The data was last updated on 2026-05-18 19:19:53.
Victoria Sivill published a dataset listing the top 20 menstrual products sold by an unnamed retailer. The products cover 35% of the retailer's total menstrual product sales between 30 April 2006 and 16 April 2015. The data is available as a 9.5 KB Excel file on figshare.
A 5.5 KB Excel file containing data on empirical coverage probability. The dataset was authored by Shiming Hao and last updated on June 3, 2026. It is hosted on figshare under a CC-BY-4.0 license.
Rhys Peploe published a dataset on figshare in June 2026. The dataset is a 13.5 KB Excel file. The description notes that column percentages may sum to 99.9% or 100.1% due to rounding to one decimal point.
5.5 KB of workload configurations for the Filebench and FFSB benchmarking tools, used to evaluate the ScaleDefrag defragmentation tool. The dataset, authored by Sangjin Lee and last updated in May 2026, is shared under a CC-BY-4.0 license on figshare. It likely contains parameters for simulating file system workloads to test defragmentation performance on flash-based SSDs.
Death registration records compiled by the NSW Registry of Births Deaths and Marriages. The data reflects the total number of registrations completed, not the number of actual life events that occurred in NSW. The dataset is licensed under CC-BY-4.0 and was last updated on 2026-05-27.