Loading...
Loading...
Student performance, MOOC logs, knowledge tracing, standardized tests, learning analytics
13,269 datasets
From the 2004/05 to 2017/18 school year, this dataset contains class size averages submitted by School Jurisdictions to Alberta Education. The data is published by the Government of Alberta. Some information related to ESL, special needs, gifted, and talented programs is suppressed in external reports.
Policy documents from the Yukon Department of Education, published by the Government of Yukon. The collection includes HTML and PDF files and was last updated in April 2026. The specific number of documents and their publication dates are not provided.
Replication data for a study on the impact of school closures on parental labor supply in Germany. The data was authored by Blanka Imre and is hosted by Harvard Dataverse. The associated paper was accepted for publication in the Journal of Labor Economics in 2026.
2,160 examples formatted for training large language models on reasoning tasks. The dataset, created by ansulev, is a reformatted version of crownelius/Opus-4.6-Reasoning-2100x, last updated on April 3, 2026. Each example follows a structured chat format with system, user, and assistant messages.
Geoscience Australia assessed the Vlaming Sub-basin for storing up to 1 gigaton of CO2 in the Gage Sandstone reservoir. The study integrates seismic, well, and marine data to model storage capacity and seal integrity risks. This work supports the Australian Government's National CO2 Infrastructure Plan.
Norfolk's Office of the Real Estate Assessor provides daily-updated property assessment and sales data for the city. It includes details such as acreage, square footage, GPIN, street address, year built, and current land, improvement, and total values for each property. The data is sourced from the ProVal records database.
The First ISCCP Regional Experiment (FIRE) Cirrus Phase II dataset contains airborne measurements from the University of North Dakota's Cessna Citation II aircraft during the second cirrus intensive field observation in southeastern Kansas from November 13 to December 7, 1991. The data was collected by NASA and includes measurements from Particle Measuring Systems probes for cloud particle concentration and size, along with temperature, dew point, pressure, wind, and aircraft position. The goal was to improve understanding of cirrus cloud life cycles and their radiative properties for use in general circulation models.
The First ISCCP Regional Experiment (FIRE) Cirrus Phase II dataset contains high-frequency atmospheric measurements collected by the University of North Dakota's Citation aircraft. The data was gathered during the second cirrus intensive field observation in southeastern Kansas from November 13 to December 7, 1991. It was produced by the National Aeronautics and Space Administration (NASA) to improve cloud and radiation models.
District of Columbia provides a dataset assessing sidewalk conditions across Washington, D.C. The data includes five condition categories, a condition index score from 0 to 11, and information on primary and secondary sidewalk material types. It also lists the eight most common sidewalk conditions with numerical values.
Daily updated sale history for active properties on the District of Columbia's real property tax assessment roll. Data originates from the Office of Tax and Revenue's Computer-Assisted Mass Appraisal system, used for ad valorem property tax valuation. The dataset represents a snapshot extracted from the CAMA system and is subject to constant maintenance and change.
A real estate dataset for house price prediction using machine learning. The data originates from King County, USA. The author, organization, and specific size are unknown.
Legacy product from the Australian Ocean Data Network, last updated on 2026-04-16. The dataset is a preliminary assessment of hydrocarbon potential for the Sahul Platform in the Bonaparte Gulf Basin, covering Northern Territory and Western Australia. It is published on data_gov_au and available in HTML and PDF formats.
580 Lean-formalized STEM problem-solving examples across physics, chemistry, calculus, and probability domains. The dataset was created by anon-ed-2026 and is hosted on Hugging Face. It was last updated on May 4, 2026.
A merged version of the separate train and test sets for the Hayes-Roth database, originally created by Barbara and Frederick Hayes-Roth in March 1989. The dataset contains synthetic instances with five nominal attributes, including hobby, age, education level, and marital status, and a class label determined by a specific rule. It is designed as a classification benchmark where only three of the five attributes are diagnostic.
A collection of 6,999 records detailing copepod abundances sampled from the Canadian Basin of the Arctic Ocean. Data originates from Clarence Greer Pautzke's PhD dissertation and was collected at Fletcher's Ice Island (T-3) from January 1970 to April 1972 and at the AIDJEX station from June to September 1975.
Jackrong created this cleaned derivative of the ianncity/KIMI-K2.5-1000000x dataset, last updated on April 17, 2026. It preserves the original four-config layout and rewrites each record into a unified reasoning-SFT schema with fields like conversations, input, output, domain, and meta. The dataset is intended for supervised fine-tuning, with the teacher model KIMI-K2.5 recorded in the metadata.
The Yampi Shelf in Northwest Australia contains the first documented active hydrocarbon seepage in a tropical carbonate shelf environment. Geoscience Australia Data collected geophysical data showing gas plumes, seabed features like pockmarks and mounds, and sub-surface seismic anomalies. The dataset was last updated on 2026-03-25.
Geoscience Australia Data published a dataset on March 25, 2026, detailing a multibeam sonar survey at the northern end of Australia's Great Barrier Reef. The data maps a shelf valley system up to 220 meters deep and extending over 90 kilometers across the continental shelf. It supports a proposed conceptual model for the formation of tidally incised shelf valleys.
KnowRL-KP-Annotations augments existing mathematical reasoning benchmarks like AIME and AMC with fine-grained Knowledge Point annotations. The dataset, created by HasuerYu, serves as a companion evaluation resource for the paper 'KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance'. It was last updated on April 15, 2026.
A dataset of human-collected web trajectories pairing instructions with sequences of webpage screenshots and corresponding agent actions like clicks, typing, and scrolling. It was created by AllenAI and last updated in March 2026. The dataset includes image and text modalities and is tagged as containing between 10,000 and 100,000 examples.