Loading...
Loading...
Student performance, MOOC logs, knowledge tracing, standardized tests, learning analytics
13,362 datasets
CAD-S is the first openly available dataset for resume credibility assessment using NLP. It supports supervised learning for detecting inconsistencies between claimed skills and supporting evidence within resumes. The dataset was created by aselasperera and was last updated in March 2026.
Titanic passenger data is a canonical benchmark for binary classification tasks in machine learning education. The dataset is published on Kaggle, a platform for data science competitions and projects. Its exact size, features, and provenance are unspecified in the provided metadata.
Education Autolabel is a dataset published on HuggingFace by author shyuni. The dataset likely contains labeled data for educational applications, inferred from its title. Its last update was recorded on 2026-05-01 12:58:12.
1,739,249 tokens of text data generated by the Qwen3.6-plus model for knowledge distillation. The dataset covers topics including coding, mathematics, finance, medicine, and economics, with a maximum sequence length of 6,500 tokens per row. It was created by author 'ansulev' and last updated on April 8, โ.
2020 data from the Sierra Leone Demographic and Health Survey (SLDHS) provides performance metrics for machine learning models predicting diarrhea in children under five. The dataset, authored by Yahye Hassan Muse, contains model evaluation results in a 9.5 KB Excel file.
Historical audit documents for Oregon, published by the State of Oregon. The data populates the Audits Search tool on the Oregon Secretary of State website and was last updated on March 8, 2026. The records are available in multiple machine-readable formats including XML, RDF, JSON, and CSV.
Revenue Example is a dataset hosted on Kaggle. Its specific content and scope must be verified after download, as detailed metadata is not provided. The author, organization, and data collection method are unknown.
Lauren Highfill's research dataset examines the relationship between personality traits and environmental enrichment effectiveness in Garnett's bushbabies. It contains assessments of five personality factors and behavioral outcomes for ten subjects across five different enrichment interventions. The study aims to inform individualized animal management strategies based on personality differences.
PISA 2003 measured the capabilities of 15-year-old students in reading, mathematics, and science literacy across participating countries. The study, conducted by the Organisation for Economic Co-operation and Development (OECD), achieved an 83 percent response rate from students sampled in April-May 2003. Mathematics literacy was the primary subject area assessed in depth for this cycle.
Enrolment data is aggregated by institution, credential type, level of study, and student origin. The dataset is provided by data.novascotia.ca and was last updated in February 2026. It includes fields for major field of study, province of residence, and registration status.
HPTN 068 post-intervention case report form data was collected by The Statistical and Data Management Center. The dataset contains follow-up assessments from young women participants who returned for a scheduled post-intervention visit, designated as Visit Code 701. The data was last updated on April 10, 2026.
Faculty of Nursing Sciences, Niger Delta University, Amassoma 2026/2027 Registra dataset likely contains student registration records for the 2026/2027 academic year. The data appears to be sourced from a Nigerian university's nursing faculty. The dataset's exact structure and size are unknown.
Kaggle hosts a dataset titled 'machine learning'. The dataset's specific content, size, and origin are not detailed in the provided metadata. Metadata is minimal; actual content requires verification after download.
exam-LoRa is a dataset published on Kaggle. The title suggests it contains data related to Low-Rank Adaptation (LoRA), a technique for fine-tuning large machine learning models. Its specific content, size, and origin are not detailed in the available metadata.
YSI sonde in-situ data supports research quantifying temporal mismatches with satellite observations in aquatic environments. The dataset is associated with a 2026 publication in Remote Sensing Letters. Specific row counts, column details, and file formats are not provided in the source description.
2019 data from a nationwide multi-sector assessment conducted by the REACH Initiative. The dataset covers topics including Education, Health, Needs Assessment, and Socioeconomics. Specific details on row count, column count, and temporal coverage are unavailable.
REACH Initiative's Whole of Afghanistan Assessment 2020 dataset documents multi-sectorial needs across the country. The assessment covers sectors including Facilities Infrastructure, Nutrition, and Health. It provides a yearly snapshot of conditions for humanitarian planning.
The Student Classroom Engagement Dataset contains behavioral, attention, participation, and learning efficiency indicators. It was sourced from Kaggle, but the author, organization, and specific collection details are unknown. The dataset's size, row count, and last update date are unspecified.
Machine Learning Ready Dataset for Indian Used Car Price Prediction. The dataset is sourced from Kaggle and focuses on Tata Motors vehicles. Specific details on size, authorship, and update frequency are not provided.
The 2014 Kaggle Higgs Boson Machine Learning Challenge dataset contains 800,000 simulated particle collision events from CERN. It was originally sourced from CERN's open data portal and prepared for a public machine learning competition. The data is provided under a CC0 1.0 public domain license.