Loading...
Loading...
Student performance, MOOC logs, knowledge tracing, standardized tests, learning analytics
13,371 datasets
ATLAS-Higgs-Boson-Machine-Learning-Challenge-2014 data originates from the 2014 Kaggle competition and was downloaded from the CERN open data portal. The dataset is used for a binary classification task to distinguish Higgs boson signals from background noise. This version encodes the placeholder value -999 as NaN.
During the DREAMS project, a collection of polysomnographic recordings was gathered from a sleep laboratory in a Belgium hospital. The data includes 20 recordings from healthy subjects, 27 from patients, and multiple expert-annotated databases for microevents like sleep spindles and apnea. The recordings were stored in the European Data Format (EDF) and published to facilitate research and algorithm comparison.
Housing 300,466 high-perplexity text samples filtered from the OpenHermes 2.5 dataset by Malum0x in March 2026. It consists of the top 30% of records that Qwen2.5-3B-Instruct identified as having the highest cross-entropy loss during scoring.
39,260 English sentences from broadcast conversations, newswire, weblogs, and web forums are paired with Abstract Meaning Representation (AMR) graphs. This semantic treebank was developed by the Linguistic Data Consortium, SDL/Language Weaver, the University of Colorado, and the University of Southern California's Information Sciences Institute. AMR graphs represent whole-sentence meaning using PropBank frames, semantic roles, coreference, named entities, modality, and negation.
The 'naniar' package provides data structures and functions for exploring and visualizing missing values in data. It facilitates the plotting of missing values and the examination of imputations, integrating with the 'ggplot2' and tidy data workflows. The work is discussed in Tierney & Cook (2023).
Risk assessment training data used in G-Health, organized into four task types to support end-to-end health examination modeling. The largest component is tabular classification, combining public and in-house datasets across a range of diseases and risk themes, including diabetes in three complementary settings. The dataset was authored by YDXX and last updated on March 12, 2026.
A digital geologic-GIS dataset for Tutuila, American Samoa, completed as part of the National Park Service's Geologic Resources Inventory program. The data is derived from a University of Hawaii Cartographic Laboratory atlas map and a geology map after Stearns (1981). It is available in multiple GIS formats including a file geodatabase, geopackage, and shapefile, accompanied by PDF documentation.
A digital geologic-GIS dataset for Tau, American Samoa, completed as part of the National Park Service's Geologic Resources Inventory program. The data includes GIS layers, tables, and ancillary documents, adapted from a University of Hawaii atlas map and a geology map after Stearns (1981). It is available in multiple GIS formats including a file geodatabase and an OGC geopackage.
A Digital Geologic-GIS Map for Ofu and Olosega, American Samoa, produced by the National Park Service Geologic Resources Inventory program. The dataset includes GIS data layers, tables, and supporting documentation, adapted from source maps by the University of Hawaii and Stearns (1981). Data is available in multiple GIS formats, including a file geodatabase and OGC geopackage.
An R script for Structural Equation Modeling (SEM) analysis of the UK's Quarterly Labour Force Survey (Household) data, authored by Charles, Tendai and hosted on Harvard Dataverse. The dataset focuses on household worklessness, educational capital, and NEET (Not in Education, Employment, or Training) risk among young adults in the UK. It was last updated on 2026-04-14.
This collection comprises unaltered data files from the U.S. Department of Education's ED Data Express website, downloaded in February 2025. It includes state- and district-level education data from school years 2010-2011 to 2021-2022, covering topics such as state assessments, graduation rates, and chronic absenteeism.
A professional judgment panel study estimates the cost of providing an adequate education for all California public school students. The report compares these estimated costs to current state expenditures and analyzes variations based on district size, location, and student need. It was authored by Jay G. Chambers.
Andrew Mertha's 2014 book analyzes China's assistance to the Khmer Rouge regime in Cambodia from 1975 to 1979. The work details the bureaucratic structures of both the Khmer Rouge and Chinese aid programs, focusing on the nature of their political relationship. It is published by Cornell University Press and comprises 175 pages.
A dissertation by Geoffrey Graham examines American Protestant schools in China from 1880 to 1930. It draws on personal missionary papers, school records, journals, and published works to analyze missionary critiques of Chinese society and their attempts at reform, particularly regarding gender. The work argues that missionary influence was a two-way process of adaptation rather than a one-sided imposition of American values.
A study examining the implementation of the No Child Left Behind Act in California, Georgia, and Pennsylvania from the 2003-2004 through 2005-2006 school years. The monograph presents final results from the Implementing Standards-Based Accountability project, analyzing strategies used by states, districts, and schools and their association with classroom practices and student achievement in mathematics and science. It was authored by Brian M. Stecher and serves as a companion to a 2007 report, updating findings with an additional year of data.
Nancy Bernkopf Tucker's historical analysis examines the critical period of 1949-1950 following the collapse of the Chinese Nationalist government. The work draws on many previously unavailable sources to assess the factors influencing Washington policymakers as the thirty-year estrangement between the U.S. and China began. It highlights the flexibility retained by Secretary of State Dean Acheson in American policy toward China.
A 1993 evaluation study involved 57 boys aged 15-18 in a juvenile corrections facility, randomly assigned to an EQUIP treatment or control group. The program, developed by John C. Gibbs of The Ohio State University, is a multi-component group treatment for antisocial adolescents focusing on moral judgment, anger management, and prosocial skills. Since the early 1990s, the program has been adapted and implemented in facilities across North America, Europe, and Australia.
Nigerian SSS Curriculum AI Benchmark is a dataset published on Kaggle. The title suggests it contains benchmark data related to the Senior Secondary School (SSS) curriculum in Nigeria, likely for evaluating AI models. Metadata is minimal; actual content requires verification after download.
Kaggle hosts a dataset titled 'GET TALENT-MACHINE LEARNING'. The dataset likely contains information related to talent acquisition or management in the context of machine learning. Its specific content, size, and origin are not detailed in the provided metadata.
Bering Sea coastal communities have documented over 3,000 Yup'ik place names since 2000 through the Calista Elders Council. The project includes named locations such as camps, rivers, rocks, and underwater channels, along with associated cultural views. It supports educational collaboration with the Lower Kuskokwim School District, where students collect and share Yup'ik history.