Loading...
Loading...
Student performance, MOOC logs, knowledge tracing, standardized tests, learning analytics
13,409 datasets
UNEP-IMEO provides a dataset for remote spectroscopic detection of methane point sources, as detailed in a 2025 arXiv preprint. It supports machine learning models for identifying methane leaks in hyperspectral imagery from satellite sensors. The specific row count, column count, and data volume are not provided in the input.
University-1652 is a multi-view multi-source benchmark for drone-based geo-localization released by author layumi for ACM Multimedia 2020. It contains annotations for 1,652 buildings across 72 universities globally, providing a framework for matching drone-view queries against satellite-view galleries. The data supports cross-view image retrieval tasks using imagery from drones, satellites, and ground-level sources.
Kaggle hosts the dataset titled 'Teacher Logits' with the raw description 'codellmaLogitsbyHabiba'. The dataset's author, organization, and specific content details are unknown. Its creation date and last update are not provided.
GM-100 contains 100 robotic tasks across three hardware platforms, featuring video demonstrations and metadata. Created by robbyant and updated in February 2026, the data is organized in the LeRobot 2.1 format for multi-embodiment benchmarking.
Environmental Information Data Centre data from a progeny-provenance trial examining Dothistroma needle blight infection in native Scots pine populations. The dataset includes multiple infection assessments, tree height measurements, chlorophyll fluorescence, branching records, and defoliation assessments for each tree, plus measurements of infected needles. The experiment ran from April 2013 to September 2015 at Torrs Warren forest in Galloway.
Over 170,000 bilingual dictionary entries pairing English words with Khmer translations. Each entry includes part-of-speech tags, definitions in both languages, and example sentences, compiled by author mrrtmob. The dataset was last updated on Hugging Face in February 2026.
The dataset's title suggests it contains information related to teachers, likely from a public source. It appears to be a sparse dataset focusing on a top 8 selection of features or entities. The dataset was published on Kaggle, but its specific content, origin, and creation date require verification.
A dataset related to scheduling for Physical Education (PE) within an adaptive curriculum framework. It was published on Kaggle, but the author, organization, and specific data characteristics are not provided. The dataset's size, structure, and creation date are unknown.
260,000 preference pairs for Direct Preference Optimization (DPO) developed by the Allen Institute for AI in 2025-2026. This mixture was utilized to preference tune the Olmo 3 Instruct 7B model using delta-aware heuristics and GPT-judge pipelines.
DARE-Bench contains between 1,000 and 10,000 records designed to evaluate Large Language Model (LLM) agents on data science modeling and instruction fidelity. Developed by Snowflake AI Research and the University of Houston for ICLR 2026, the dataset focuses on tool-use and text generation within data science workflows. It provides a selected subset of a larger benchmark for testing how models handle complex data manipulation instructions.
Gold price data covering the period from 2000 to 2026. The dataset includes technical indicators and is described as clean and ready for machine learning applications. The original author and organization are unknown.
39% of the 8,103 square km area examined in Puerto Rico showed landslide activity following Hurricane Maria. The dataset classifies 2 km x 2 km grid cells based on visual analysis of high-resolution satellite and aerial imagery collected between September 26 and October 8, 2017. It was created by Erin K. Bessette‐Kirton to inform disaster response and recovery efforts.
Adriano Moreira from the University of Minho collected this data for Wi-Fi fingerprinting research. The data was gathered on a single Nvidia Shield tablet over two days in May 2016. It covers the first floor of a specific campus building in Portugal.
fastshap computes approximate Shapley values for any supervised learning model, offering a faster alternative to other implementations. The method explains predictions from black box models using game-theoretic principles established by Strumbelj and Kononenko (2014). The tool was authored by Brandon Greenwell and is hosted on the paperswithcode platform.
Functions, data sets, examples, demos, and vignettes for the book 'Applied Econometrics with R' by Christian Kleiber and Achim Zeileis, published in 2008. The package is hosted on the paperswithcode platform and is designed to accompany the textbook. It includes materials for replicating and extending the econometric analyses presented in the book.
1,400 individuals in Rio Grande do Sul, Brazil, were surveyed to model financial literacy levels. The study by Ani Caroline Grigion Potrich used descriptive statistics and multivariate analysis to link literacy to socioeconomic and demographic variables. Most respondents (67.1%) were classified as having a low financial literacy level.
A national cross-section of the electorate in England, Scotland, and Wales was first interviewed in 1969 (1,114 respondents). Of these, 792 were reinterviewed in 1970, and 1,093 new respondents were added to create a representative sample for the second wave. The data, associated with author David Butler, includes questions on political interest, activities, media usage, and opinions on political issues and party evaluations.
Estimates of educational attainment by sex for persons aged 25 and over, constructed by Robert J. Barro and Jong-Wha Lee. The data covers a panel of 138 countries, including China, with values at five-year intervals from 1960 to 1990. The study updates previous work from 1993 by incorporating census information for 1985 and 1990.
Multi-Modal Deep Learning Rice is a dataset hosted on Kaggle. Its title suggests it contains data related to rice, likely for training multi-modal deep learning models. The specific content, size, and origin are not detailed in the provided metadata.
Three large urban school districts in the United States, with enrollments ranging from 50,000 to over 200,000 students, were studied by the Consortium for Policy Research in Education. The study examines how district and school staff made strategic decisions about instructional improvement and the weight they gave to research evidence. The analysis focuses on decisions about adopting reform designs, implementing changes, and scaling up reforms.