Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
42,093 datasets
Four related spatial products covering roads, rivers, hypsography, and derived 25-meter resolution Digital Elevation Models for four study areas in the Brazilian Amazon. The data comprises 15 files, including 12 shapefile archives and 3 GeoTIFFs, created by hand-digitizing paper maps produced by the Brazilian government from aerial coverages. Data providers explicitly note the data have not been validated or quality-assured for general use.
Community meetings led by the Colombian National Police with participation from citizens, institutions, and public/private entities. The dataset includes columns for impacted citizens, zone, station, district, metropolitan area, region, and neighborhood. It is hosted on the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-18.
A research paper and associated data concerning the estimation of finite-dimensional parameters in conditional moment restriction models with missing endogenous variables. The work, authored by Antonio Cosma and last updated on 2026-05-04, includes an empirical application to a female labor supply model with a sample size exceeding 200,000 observations. The dataset is a 699.7 KB collection of PDF and TXT files hosted on figshare.
15 standardized pediatric depression FAQs were submitted to three large language models (ChatGPT-5, Microsoft Copilot GPT-5, DeepSeek 3.1V). Responses were scored for readability using seven indices, accuracy and completeness on a 0-6 scale, and sentiment. The dataset was created by RongQi Jiao and last updated on April 29, 2026.
Floridablanca municipality property registry data includes parcel identification, owner details, and general characteristics. The dataset is hosted by the Colombian open data portal, datos.gov.co, and was last updated on 2026-05-18. It contains columns for municipality, economic purpose, department, land area, address, parcel number, and built area.
A paper by Francesco Bravo proposing and evaluating new inference methods for nonparametric estimating equations models. The work includes simulation studies and a real data example to illustrate the finite sample properties of the proposed test statistics and bootstrap method. The paper is available as a 14.1 MB PDF file under a CC-BY-4.0 license.
301,256 entries of code review data from 1,807 high-quality GitHub projects in C, C++, Java, and Python. This dataset, created by Yuxin Zhang and released in April 2026, supports research on retrieval-augmented generation for automated code review. It includes a manually annotated evaluation subset of 384 entries and a time-split retrieval database.
figshare hosts a dataset supporting the AUG tool for automated UML diagram generation. The dataset likely contains evaluation data and benchmarks for UML class, use case, and sequence diagrams generated by the GLM4-9B model. The repository includes code, datasets, and evaluation data, with the dataset last updated on May 5, 2026.
A 405.2 KB PDF document authored by Luana Benz and last updated on May 13, 2026, presents research data on mental health. The study investigates how the experience of agency—the feeling of being in control—is linked to mental health and coping strategies in the face of personal and global challenges. It explores complex mechanisms, including the role of perceived responsibility in specific challenging contexts.
From February 2016 onward, this dataset provides science-quality ocean surface wind vectors from the ISS-RapidScat scatterometer, processed at a 12.5 km grid resolution. It is a calibrated replacement for earlier versions, specifically addressing low signal-to-noise ratio states. The data is restricted to latitudes between approximately 61 degrees North and South due to the instrument's orbit on the International Space Station.
124.6 KB of raw data from experiments exposing mice to high-fat diets (45% or 60% fat) during adolescence. The dataset, compiled in a single Excel spreadsheet by Fabien Naneix and last updated in April 2026, includes body weight, energy intake, glucose tolerance tests, and lever press response rates during various behavioral conditioning and devaluation tests. These data were used in the article 'Adolescent obesity induces sex-specific alterations of action control'.
NASA's RSS SSM/I Ocean Product Grids provide 3-day average geophysical retrievals from the DMSP F13 satellite. Remote Sensing Systems produced this data using a unified algorithm refined over 20 years to simultaneously measure ocean wind speed, atmospheric water vapor, cloud liquid water, and rain rate. The Global Hydrology Resource Center reformatted the original binary data into netCDF files.
Global satellite data provides 3-day averaged ocean surface wind speed, atmospheric water vapor, cloud liquid water, and rain rate. The dataset is produced by Remote Sensing Systems under NASA's MEaSUREs Program using a unified algorithm refined over 20 years. Data from the DMSP F14 satellite is reformatted into netCDF files by the Global Hydrology Resource Center.
Satellite-derived oceanographic grids contain 3-day averaged measurements of ocean wind speed, atmospheric water vapor, cloud liquid water, and rain rate. The dataset is produced by Remote Sensing Systems for NASA's MEaSUREs Program using a unified algorithm refined over 20 years. Data from the DMSP F11 satellite is reformatted into netCDF files by the Global Hydrology Resource Center.
Ocean wind speed, water vapor, cloud water, and rain rate data retrieved from the DMSP F10 satellite's Special Sensor Microwave/Imager (SSM/I). The dataset is produced by Remote Sensing Systems for NASA's MEaSUREs Program using a unified algorithm refined over 20 years. Data is provided in daily netCDF grids, extending a long-term time series of oceanographic and atmospheric variables.
Remote Sensing Systems provides monthly averaged satellite measurements of four key ocean-atmosphere variables: wind speed, water vapor, cloud water, and rain rate. This dataset is part of NASA's MEaSUREs Program, produced using a unified algorithm refined over 20 years and intercalibrated across SSM/I and SSMIS sensors. Data from the DMSP F13 satellite is reformatted into netCDF files by the Global Hydrology Resource Center.
Remote Sensing Systems provides monthly average ocean product grids from the DMSP F14 satellite's SSM/I sensor. The dataset contains retrievals for ocean wind speed, atmospheric water vapor, cloud liquid water, and rain rate using a unified algorithm refined over 20 years. Data is produced under NASA's MEaSUREs Program and reformatted into netCDF by the Global Hydrology Resource Center.
NASA's MEaSUREs Program provides monthly average ocean data from the DMSP F16 satellite's Special Sensor Microwave Imager Sounder (SSMIS). Remote Sensing Systems uses a unified algorithm refined over 20 years to simultaneously retrieve ocean wind speed, water vapor, cloud water, and rain rate. The Global Hydrology Resource Center reformats the binary data into netCDF products, with SSMIS data intercalibrated to extend the SSM/I time series.
Global satellite data from the DMSP F8 spacecraft provides monthly averaged ocean surface wind speed, atmospheric water vapor, cloud liquid water, and rain rate. Remote Sensing Systems produced this dataset using a unified physical algorithm refined over 20 years, with SSMIS data intercalibrated to extend the SSM/I time series. The Global Hydrology Resource Center reformatted the data into netCDF files as part of NASA's MEaSUREs Program.
A 3-day average dataset from the DMSP F16 satellite's SSMIS sensor, providing a long-term time series of key ocean and atmospheric variables. Remote Sensing Systems produced the data using a unified physical algorithm refined over 20 years, and NASA's MEaSUREs Program distributes it via the Global Hydrology Resource Center. The data is intercalibrated with earlier SSM/I sensors to ensure consistency.