Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
43,995 datasets
Sudan's humanitarian needs data contains overall people in need and intersectoral severity by disaggregation level, which includes administrative divisions and population groups. The dataset is produced by the United Nations Office for the Coordination of Humanitarian Affairs (OCHA) in collaboration with humanitarian partners using the Joint Intersectoral Analysis Framework (JIAF). It was last updated on May 18, 2026.
A 2026 study by Aihua Zhang presents a framework for climate vulnerability assessment using Large Language Models and conformal prediction. The 18.6 KB document contains empirical validation results from Guangdong, Sichuan, and Yunnan provinces. It reports performance metrics like a calibration correlation of 0.816 and a root mean square error of 7.8.
Geological Survey of Victoria data contains primary geological boundaries and faults for Pre-Permian rock units. The dataset was compiled from surface geology maps and interpretation of magnetic, radiometric, gravity, and seismic data to produce a geologically and geophysically reasonable map. It should be used in combination with the state magnetic image for additional context on magnetic properties, dyke swarms, and basalt cover.
A dataset collected for a counseling psychology research project investigates the moderating role of psychological flexibility. It includes demographic variables and questionnaire responses from Iranian women and was analyzed using IBM SPSS Statistics. The data are provided for academic and research purposes.
Statistical results from linear mixed models analyzing the interaction effect between urban centers on morphological polycentricity. The dataset, authored by Juan Zhu and last updated on June 1, 2026, is a 5.5 KB Excel file containing fixed effect estimations and model fitting statistics.
ORNL_CLOUD provides coefficients for correcting atmospheric effects in satellite radiometric data from the FIFE project. These coefficients, generated using the Fraser and LOWTRAN 7 models, are inputs for algorithms that derive surface reflectance from raw satellite and aircraft measurements. The dataset is hosted on multiple platforms, with metadata indicating updates as recent as 2026.
A synthetic dataset of 10,000 Spanish-language IT support interactions designed for model fine-tuning. The dataset is created by author bronc2 and was last updated on the platform in June 2026. It includes a free 50-record sample, with the full dataset available for purchase.
20 women with PTSD symptoms from the war in Ukraine underwent a four-week intervention of transcutaneous auricular vagus nerve stimulation combined with slow breathing. Data includes self-reported PTSD, depression, anxiety, sleep, and somatic symptom scores, plus physiological measures like heart rate variability and respiratory rate, collected at five time points from one month before to two months after the intervention. The dataset, shared under a CC-BY-4.0 license by Mikołaj Szulczewski, totals 1.3 GB.
Santa Barbara County, California, is covered by this Level 2 dataset containing surface emissivity and land surface temperature derived from airborne hyperspectral thermal imagery. It consists of 91 flight scenes collected on March 23, 2022, by the HyTES instrument during the SHIFT campaign, covering approximately 1,656 square kilometers. The data, provided by ORNL_CLOUD in HDF5 format, supports the modeling of surface energy and water fluxes.
Spatial datasets of vegetation vulnerability and hydro-climatic drivers under compound dry–hot conditions in northern China (1982–2022) were created by Bo Yuan. The dataset includes raster products for vegetation loss probability, hydro-climatic driver variables, event characteristics, and static environmental variables. It is 36.0 MB in size and was last updated on 2026-05-12.
Chunyan Wang authored a dataset describing a class of clear split-plot designs constructed via a parallel flats structure. The dataset, last updated on 2026-05-12, includes files in PDF, DS_STORE, and M formats totaling 448.5 KB. These designs are proposed for experiments where some factors are hard to change, dividing factorial effects into orthogonal subspaces to simplify model selection.
SWOT launched December 16, 2022 to measure global ocean topography using Ka-band radar interferometry. This Level 2 product provides sea surface height, anomaly, wind speed, and wave height on a 250x250 meter 'native' grid with minimal smoothing. Data is distributed as one netCDF-4 file per satellite pass, covering a swath 10-60km wide on each side of the nadir track.
Geological Survey of Victoria data contains Pre-Permian geological rock units and boundary types, including faults. The dataset was compiled from surface geology maps and interpretation of magnetic, radiometric, gravity, and seismic data to produce a geologically and geophysically reasonable map. It is intended for use with the state magnetic image for additional context on magnetic properties, dyke swarms, and basalt cover.
The Régie du Bâtiment du Québec (RBQ) requires contractors, promoters, and owner-builders to hold a license for construction work. This dataset lists all active RBQ license holders, published by the Government and Municipalities of Québec. The data was last updated on April 17, 2026.
Historical gasoline and aviation fuel tax rates for Ontario, with changes documented from 2017 to 2025. The dataset includes specific rates for unleaded gasoline, leaded gasoline, aviation fuel, and Northern Ontario, provided by the Government of Ontario. It is available in CSV and HTML formats and was last updated on April 17, 2026.
Fattah Golden Superset is a large-scale supervised fine-tuning dataset built by Nomeda Labs for training the Fattah family of coding and agentic coding models. The dataset is described as a labeled superset with no baked-in training ratios, allowing researchers to filter on capability columns to create custom mixtures. The dataset was last updated on June 1, 2026.
3551 baptisms, marriages, and burials recorded in the earliest surviving church registers in Nova Scotia. Nova Scotia Archives transcribed and translated these Acadian parish records from 1702-1755 for the Acadie 2003-2005 Celebrations. The data provides a tangible link to the last generations of Acadian French living at Annapolis Royal before the Deportation.
The wmt26-mist-sample is a multilingual mix provided by the WMT26 MIST shared task organizers. It contains three types of tasks: context-based QA, open-ended QA, and mono- and cross-lingual summarization. The dataset is intended as a starting point for fine-tuning multilingual large language models.
Alberta Environment and Protected Areas and the Alberta Biodiversity Monitoring Institute developed a Native Cover indicator for Alberta. The dataset tracks aquatic and wetland native cover (AWNC) and terrestrial native cover (TNC) across Hydrological Unit Code 8 watersheds for the years 2010, 2018, 2019, 2020, and 2021. Calculations use ABMI's Wetland and Human Footprint Inventories and Alberta government's DEM-derived riparian data and watershed boundaries.
266 boreholes drilled across Alberta since 1920 are compiled in this interim release. The Alberta Geological Survey began systematically compiling borehole log information into a database in 2010. The dataset comprises three relational tables detailing project sources, borehole summaries, and geological intervals.