Loading...
Loading...
Student performance, MOOC logs, knowledge tracing, standardized tests, learning analytics
13,313 datasets
Global K-12 STEM, Robotics, AI & Engineering Education Dataset (Grades 1–12) is aggregated from Kaggle. Its specific size, source, and update frequency are not detailed in the available metadata. The dataset likely contains information on educational programs, resources, or outcomes related to STEM fields.
Synthetic supervised fine-tuning examples were generated by teacher models evaluated in the Polyglot Teachers paper. The dataset contains examples across six languages: Arabic, Czech, German, Indonesian, Japanese, Spanish, and Tagalog. It was created by ljvmiranda921 and last updated on April 5, 2026.
2,997 samples totaling approximately 5.8 million tokens were created by ToastyPigeon for fine-tuning the Qwen3.5-9B model. The dataset comprises 1,925 personality-based conversation examples and 1,072 tool calling examples. It was last updated on March 23, 2026.
A dataset concerning housing prices in Bengaluru, India, compiled by AmitabhaChakraborty. The description references a study indicating property prices in the city fell by almost 5 percent in the second half of 2017. It is released under a CC0-1.0 license.
FitBit_Steps provides minute-level step counts recorded by wearable devices for multiple users. The dataset, authored by Mobius and sourced from OpenML, contains user IDs, timestamps, and step counts for precise activity tracking. It is released under a CC0-1.0 license.
Two collection epochs of system activity data gathered every 5 seconds from a Sun Sparcstation 20/712 in a multi-user university department. The dataset contains 22 attributes measuring memory, process, and system call activity, with the goal of predicting the portion of time CPUs run in user mode. The final dataset includes an equal number of observations from each collection period.
649 student records from two Portuguese secondary schools, collected via school reports and questionnaires. The dataset includes final year grades (G3) alongside demographic, social, and school-related features for the Portuguese language subject. It was uploaded to OpenML under a CC-BY-4.0 license.
STAND_EXAM_PUB_PT is a spatial dataset of Forest Stand Exam point locations recorded via the EcoSurvey application. The data, published by the Department of the Interior, is used to generate stand-level statistics and export files for growth and yield models. It was last updated on 2026-03 26.
Chlorophyll-A and phaeophytin data were collected by Florida State University along the North American Atlantic coastline during three periods in 1986. The dataset likely contains position and concentration measurements in milligrams per cubic meter, with samples taken every two hours. It provides a snapshot of phytoplankton biomass and water quality for a specific region and year.
From December 1986 to August 1987, conductivity, temperature, depth, and oxygen data were collected aboard the RRS Charles Darwin in the Indian Ocean and Arabian Sea. The dataset was part of the Monsoon And Sea-Air Interaction (MASAI) project and is available in processed NODC C100 and F-022 formats. It provides high-resolution vertical profiles for studying oceanographic conditions during the monsoon period.
A structured archive of product catalogs from ryans.com, a major retail chain for computer hardware and electronics in Bangladesh. The dataset contains technical specifications, pricing, and descriptions. It was scraped and published by sayurio.
October 9-18, 2017 surveys document the impact of Hurricane Irma on the Florida Reef Tract. National Oceanic and Atmospheric Administration researchers collected coral demographic data and roving diver observations across 57 sites from Biscayne Bay to the Marquesas. The dataset includes detailed belt transect records and broad-scale photographic documentation of damage and disease.
A 2026 study by Davin Nabizadehchianeh from Harvard Dataverse compares human and AI psychological responses to Kurdish independence. It analyzes autoethnographic data from interactions with individuals from Turkey, Iran, Iraq, and Syria, alongside responses from 37 variants across nine large language model platforms. The analysis reveals a stark contrast, with 70.27% of LLM variants supporting independence versus near-universal human resistance.
University of Alaska Institute of Marine Science collected this dataset aboard the R/V Alpha Helix during cruise HX94 from December 15 to 19, 1986. It contains high-resolution conductivity-temperature-depth (CTD) and salinity-temperature-depth (STD) profiles from 17 stations in the Gulf of Alaska and Prince William Sound. Data is processed to the NODC standard High-Resolution STD/CTD Data (F022) format.
WFD RBMP2 Economic analysis 2015_Scenario 3 and 4_v1.10 is an impact assessment dataset for updated river basin management plans in England, created by the Environment Agency in 2016. It contains data for all water bodies in England, comparing scenarios of technically feasible improvements regardless of cost versus those where benefits exceed costs. The dataset was built from multiple data sources and is based on many assumptions.
49 survey stations provide underwater video footage and still images from the Leveque Shelf in the Browse Basin, collected in May 2013. The data includes real-time onboard characterizations and USBL navigational files for each video transect. This marine survey was conducted by Geoscience Australia to assess seabed geology and CO2 storage potential.
Greater London Authority data shows the percentage of pupils at state-funded schools who live more than 2 miles from school (for those under 8) or 3 miles from school (for those over 8). The data is derived from the DfE National Pupil Database and was used to create the GLA London Schools Atlas. The dataset was last updated on the platform on 2026-03-25.
Supplemental data for a scoping review titled 'Examining the meaning and methodological characteristics of the systematized review label'. The data includes an extraction dictionary, extracted data for each review, and citations for excluded non-English studies. It was authored by Zahra Premji and last updated on April 25, 2026.
South Korean district-year panel data from the Survey on Private Tutoring Expenditures in Primary and Secondary Education, spanning 2010 to 2024. The dataset is merged with regional indicators from official statistical sources, including lagged repeater rates and university admission competition ratios. It was authored by Jieun Hong and published via Harvard Dataverse in April 2026.
A literature review authored by David Topps, harvested from Borealis Dataverse and last updated on April 25, 2026. The work critically examines claims about the efficacy of AI-assisted avatars in medical education, specifically regarding virtual patients and learning outcomes beyond engagement.