Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
43,998 datasets
February 2005 saw a Fisheries Science Partnership survey of sole and plaice in ICES Divisions VIIf&g in the eastern Celtic Sea. Sixty-four hauls were conducted using twin 4-metre beams and 80 mm mesh cod-ends aboard the commercial beam trawler FV Nellie, off the north coasts of Cornwall and Devon and the Bristol Channel. The dataset likely contains haul-level catch data for these two flatfish species.
Spring 2023 flood mapping integrates open water extent polygons from radar satellites, event location points, and affected municipality boundaries. The dataset combines RADARSAT Constellation Mission imagery processed by Natural Resources Canada with Sentinel-1 and Sentinel-2 data from the European Space Agency. Records of flood events are maintained by the Deputy Directorate General of Operations of Québec's Ministry of Public Security.
The ASTER Global Water Bodies Database (ASTWBD) Version 1 maps water bodies larger than 0.2 square kilometers globally at a 1 arc-second (approximately 30-meter) spatial resolution. It classifies water into three categories—ocean, river, or lake—and provides corrected, flattened elevation values for each, generated from ASTER imagery acquired between March 2000 and November 2013. The dataset is distributed as global tiles in GeoTIFF format, covering latitudes from 83°N to 83°S and referenced to the WGS84/EGM96 geoid.
1987 aircraft missions collected this atmospheric boundary layer fluxes dataset during the FIFE experiment's IFCs 3 and 4. The University of Wyoming King Air used an eddy-correlation method with a gust probe to measure momentum and scalar fluxes. Data includes high-pass filtered fluctuations for variables like temperature and water vapor mixing ratio.
Global coverage from 83°N to 83°S identifies water bodies larger than 0.2 square kilometers at a 1-arc-second (approximately 30-meter) resolution. The dataset classifies features into ocean, river, or lake categories and provides corrected elevation values for water surfaces. It was generated from ASTER satellite imagery acquired between March 2000 and November 2013 to accompany the ASTER Global Digital Elevation Model.
Companion data release for the 2026 paper 'Δ-Harness: An Agentic Data Harness for Generative Visual and World Models'. The dataset contains experimental data produced and consumed by the DeltaSynth pipeline for LoRA training and held-out evaluation. The code and pipeline are available on GitHub under haolpku/DeltaSynth.
Species-level genome bins generated from microbes in four stomach compartments of three bovine species, including gayal. The dataset is 246.4 MB in size, authored by Yuming Chen, and was last updated on May 29, 2026. It is shared under a CC-BY-4.0 license on figshare.
A 217.5 KB Excel database supporting academic research on gender expectations in media coverage. It was created by Edrei Álvarez-Monsiváis for a 2026 journal article and related conference presentations. The dataset likely contains structured analysis of news articles about the first female Chief Justice of the Mexican Supreme Court.
Research data supporting the paper "Stable infinite-temperature eigenstates in SU(2)-symmetric nonintegrable models". The dataset includes text files and code for generating Hamiltonian spectra, calculating zero-energy degeneracies, and analyzing entropy and Lochschmidt echoes. It was authored by Christopher Turner and last updated on 2026-05-05.
A 25.1 MB collection by Tugba Y. Ozmen, last updated in April 2026, investigates assays for homologous recombination deficiency and replication stress in cancer. The work includes a comparative pan-cancer analysis of therapy efficacy and toxicity based on results from clinicaltrials.gov. It explores the integration of these pathways with immune contexture to inform next-generation treatment strategies.
Archived records of civil security events in Quebec, systematically grouped by the Ministry of Public Security. The database documents event consequences, evolution, and categorizes them by impact level and emergency response required, based on the Canadian Common Alert Protocol profile. Data compilation includes reports from the Government Operations Center and regional directorates since 1996.
121,422 expert-level instruction-response pairs for offensive cybersecurity tasks. The dataset was created by author 'oyildirim' and is described as the largest open-source offensive cybersecurity SFT dataset. It was last updated on June 17, 2026.
Colombian data on graduates from the Colegio Mayor del Cauca university institution, starting from the 2011-I semester. The dataset tracks the number of graduates by program, period, academic level, and gender. It is published via the Colombian open data portal.
An unpublished database analyzing front pages and opinion columns from major-circulation newspapers for gender stereotypes targeting presidential candidates Sheinbaum and Gálvez. The 189.1 KB XLSX file served as the basis for academic publications and conference presentations. It was last updated on 2026-05-15.
A meta-epidemiological study protocol analyzes publication delays in systematic reviews. The dataset likely contains records of interventional, RCT-based meta-analyses published in top-tier general medical journals and the Cochrane Database of Systematic Reviews between 2023 and 2025. Jia Song authored this protocol, which was uploaded to figshare in April 2026.
A cleaned Wikipedia corpus combines Serbian and Croatian Wikipedia articles. Croatian text has been transliterated to Cyrillic script, and wiki markup, infoboxes, and stub articles have been removed. The corpus was compiled by RafaelUI and is available on Hugging Face.
A geospatial analysis compares the levelized cost of heat and carbon removal for three decarbonized thermal energy sources across the United States. The study uses detailed process models for sedimentary basin geothermal, concentrated solar, and heat pump technologies, with sorbent-based direct air capture as a case study. The dataset was authored by Caleb H. Geissler and last updated on April 28, 2026.
Experimental data for a series of novel podophyllotoxin derivatives designed to target LAT1 transporters for esophageal cancer treatment. The dataset includes results for the lead compound B11, showing a 64.6% tumor growth inhibition in mice and a more than 4-fold improvement in tolerability compared to etoposide. The data was authored by Manwei Jia and last updated on 2026-04 14.
Wistar rats (Rattus norvegicus) were the source for isolated cardiac mitochondria used to study the direct effects of dapagliflozin. The dataset, authored by Itanna Isis Araújo de Souza and last updated in May 2026, contains findings on oxygen consumption, ATP production, ROS generation, and membrane potential. It is shared under a CC-BY-4.0 license as a 1.9 MB DOCX file.
Lord Howe Island shelf in NSW was surveyed by Geoscience Australia in 2008. The survey mapped seabed bathymetry and characterized benthic environments using sediment sampling, rock coring, underwater video, and current measurements. The lh_back_8m grid is a processed backscatter product covering 1034 sq km, derived from EM300 data.