Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
43,998 datasets
Growth parameters, foliar nutrient concentrations, and ectomycorrhizal colonization data for two dipterocarp seedling species. The dataset results from a 20-month factorial experiment applying nutrients and fungicide in the Kabili-Sepilok Forest Reserve. It was authored by Francis Brearley and last updated in June 2026.
P3D-Bench provides 1,003 cases for evaluating parametric 3D generation models across three distinct tasks. SpatiaOS released this lightweight benchmark data in 2026, containing UID lists and derived annotations. The data splits include 400 text-to-3D cases, 400 image-to-3D cases, and 203 assembly-level 3D cases.
The Canberra 1:100,000 Geological Sheet covers approximately 2500 km² of hilly, upland terrain in the Australian Capital Territory and southeastern New South Wales. The bedrock comprises Ordovician to Silurian sediments and acid volcanics which have been invaded by several generations of Silurian intrusions. The dataset is hosted by the Australian Ocean Data Network and was last updated on 2026-06-05.
Field measurements from four plots in Gonarezhou National Park, southeastern Zimbabwe, used to analyze selective foraging by aardvarks. The dataset includes structural attributes, spatial coordinates, and derived cost metrics for termite mounds on basalt and granite substrates. Data were collected by Justice Muvengwi and last updated on April 10, 2026.
A geospatial dataset of provincial Crown Land, including land managed by the Department of Energy. The data is hosted by the Government of New Brunswick on the Socrata platform and was last updated on 2026-05-29. The dataset's row count and temporal coverage are not specified in the available metadata.
Around 89.9k conversation examples for instruction tuning models in Algerian Darija, a dialect characterized by code-switching between Arabic, French, and local expressions. The dataset was created by the awras-ai project and is hosted on Hugging Face. It was last updated on 2026-06-18.
Global Affairs Canada periodically conducts evaluations of its priorities, programs, and projects. The reports serve as a practical management tool for reviewing performance and improving the design and implementation of upcoming initiatives. Each evaluation generates a report, with this collection likely focusing on reconstruction assistance in the Philippines from 2013-14 to 2018-19.
Four CSV files contain outputs and evaluation scores from experiments described in the paper 'MLLM-as-a-Judge for Financial Document Image Machine Translation'. The dataset likely includes translations generated by Gemma models, scores assigned by a judge model, and its reasoning. Yanco Amor and Torterolo Orta published this data via e-cienciaDatos Harvested Dataverse on June 14, 2026.
Bathymetry data for the Port Fairy Wave Energy Site was acquired by Deakin University Marine Mapping lab on March 30, 2021. The survey was conducted from the Motor Vessel Yolla using a Kongsberg EM2040c multibeam echosounder. These data were collected to assess the impact of a wave energy structure placed on the seafloor.
Evaluation reports are generated by Global Affairs Canada to review the performance of its priorities, programs, and projects. The reports serve as a practical management tool to improve the design and implementation of upcoming initiatives. The dataset is published under the OGL-CA-2.0 license and was last updated in May 2026.
February 2005 saw a Fisheries Science Partnership survey of sole and plaice in ICES Divisions VIIf&g in the eastern Celtic Sea. Sixty-four hauls were conducted using twin 4-metre beams and 80 mm mesh cod-ends aboard the commercial beam trawler FV Nellie, off the north coasts of Cornwall and Devon and the Bristol Channel. The dataset likely contains haul-level catch data for these two flatfish species.
Spring 2023 flood mapping integrates open water extent polygons from radar satellites, event location points, and affected municipality boundaries. The dataset combines RADARSAT Constellation Mission imagery processed by Natural Resources Canada with Sentinel-1 and Sentinel-2 data from the European Space Agency. Records of flood events are maintained by the Deputy Directorate General of Operations of Québec's Ministry of Public Security.
The ASTER Global Water Bodies Database (ASTWBD) Version 1 maps water bodies larger than 0.2 square kilometers globally at a 1 arc-second (approximately 30-meter) spatial resolution. It classifies water into three categories—ocean, river, or lake—and provides corrected, flattened elevation values for each, generated from ASTER imagery acquired between March 2000 and November 2013. The dataset is distributed as global tiles in GeoTIFF format, covering latitudes from 83°N to 83°S and referenced to the WGS84/EGM96 geoid.
1987 aircraft missions collected this atmospheric boundary layer fluxes dataset during the FIFE experiment's IFCs 3 and 4. The University of Wyoming King Air used an eddy-correlation method with a gust probe to measure momentum and scalar fluxes. Data includes high-pass filtered fluctuations for variables like temperature and water vapor mixing ratio.
Global coverage from 83°N to 83°S identifies water bodies larger than 0.2 square kilometers at a 1-arc-second (approximately 30-meter) resolution. The dataset classifies features into ocean, river, or lake categories and provides corrected elevation values for water surfaces. It was generated from ASTER satellite imagery acquired between March 2000 and November 2013 to accompany the ASTER Global Digital Elevation Model.
Companion data release for the 2026 paper 'Δ-Harness: An Agentic Data Harness for Generative Visual and World Models'. The dataset contains experimental data produced and consumed by the DeltaSynth pipeline for LoRA training and held-out evaluation. The code and pipeline are available on GitHub under haolpku/DeltaSynth.
Species-level genome bins generated from microbes in four stomach compartments of three bovine species, including gayal. The dataset is 246.4 MB in size, authored by Yuming Chen, and was last updated on May 29, 2026. It is shared under a CC-BY-4.0 license on figshare.
A 217.5 KB Excel database supporting academic research on gender expectations in media coverage. It was created by Edrei Álvarez-Monsiváis for a 2026 journal article and related conference presentations. The dataset likely contains structured analysis of news articles about the first female Chief Justice of the Mexican Supreme Court.
Research data supporting the paper "Stable infinite-temperature eigenstates in SU(2)-symmetric nonintegrable models". The dataset includes text files and code for generating Hamiltonian spectra, calculating zero-energy degeneracies, and analyzing entropy and Lochschmidt echoes. It was authored by Christopher Turner and last updated on 2026-05-05.
A 25.1 MB collection by Tugba Y. Ozmen, last updated in April 2026, investigates assays for homologous recombination deficiency and replication stress in cancer. The work includes a comparative pan-cancer analysis of therapy efficacy and toxicity based on results from clinicaltrials.gov. It explores the integration of these pathways with immune contexture to inform next-generation treatment strategies.