Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,799 datasets
A 1.2 MB PDF file authored by Wenjing Li, last updated on 2026-04-20. The file contains supplementary data for a study investigating how the fungicide Azoxystrobin binds to specific sites on the Peroxiredoxin 1 protein to induce mitochondrial dysfunction and apoptosis in oral leukoplakia cell lines.
27,140 COVID-19 vaccination news articles from 39 South African outlets form the corpus for this racial bias study. Researcher Nnaemeka Ohamadike trained an ensemble of Word2Vec models to embed each outlet's language and measure its association with racial stereotype vocabularies. The dataset was published on April 5, 2026.
ShahzebKhoso hosts raw evaluation metrics, execution telemetry logs, and structural syntax outputs from running the Mostly Basic Python Problems (MBPP) benchmark against the StarCoder2 15B base model. The dataset captures telemetry from conversational evaluation loops to establish a baseline for unaligned foundational weights. It was last updated on May 28, 2026.
A dataset for detecting 14 types of logical fallacies in English text, created by kuwrom. It contains 138,574 rows for multi-class classification and 25,068 instruction examples for fine-tuning. The dataset was last updated on June 3, 2026.
October 2018 field observations and laboratory analyses for rock samples from the Kamativi area of Zimbabwe. The dataset includes whole-rock geochemical data from ICP-MS and mineralogical data from XRD and SEM-EDS, collected by the British Geological Survey. Data were gathered to research the internal evolution and crystallisation of lithium pegmatites.
70,000 synthetic human face images generated by the stratum-hq tool. The dataset includes multiple annotation layers such as captions, depth maps, normals, pose, segmentation, and embeddings from models like DINOv3 and T5. It was created by author 'timlawrenz' and last updated on the platform in May 2026.
This dataset provides land use and development planning information for the Montreal agglomeration, focusing on heritage and landscape features. It includes mapped data for built or archaeological heritage, emblematic landscapes, and views of interest to guide sustainable urban development decisions. The data originates from section 2.3 of the official Land Use and Development Plan.
A 493.7 KB Excel dataset containing a simplified model of a hydro-generator shaft system with two rotors. The model, created by Tengjiao Guo, incorporates electromagnetic, mechanical, and flow excitation. It uses a 'shape guidance' strategy based on test signal libraries to analyze axis trajectories and correlate them with frequency components and excitation sources.
New South Wales, Australia, seabed data collected by the NSW Department of Planning and Environment from March 2019 to August 2022. The dataset contains 32-bit floating point geotiff files of bathymetry and backscatter in 5-meter resolution, derived from multibeam sonar surveys. It was created to provide a baseline and map seabed type distribution as part of the SeabedNSW program.
Urban planning data from the Montreal agglomeration's Land Use and Development Plan outlines parameters for sustainable development decisions. The dataset includes thematic information on transport, compact neighborhoods, and economic development, accessible via an interactive map. Row and column counts are not specified.
Government and Municipalities of Québec provides three annual satellite mosaics covering the entire territory of Quebec. The mosaics contain multispectral imagery from the Copernicus Sentinel-2 mission for 2018, 2019, and 2020, featuring blue, near infrared, and short wave infrared spectral bands.
Jie-Long Shen published geochemical and isotopic data for Paleoproterozoic S-type granites from the Helanshan Complex in the North China Craton. The dataset includes whole-rock geochemistry, zircon U-Pb dating results, and Nd isotopic data for samples with crystallization ages around 1.95 Ga. It was last updated on 2026-04-11 and is available under a CC-BY-4.0 license.
NASA's Atmospheric Science Data Center processes particulate matter measurements from a global in-situ surface monitoring network. The MAIA Surface Monitor Stage 0 files contain these processed PM data as an ancillary dataset. Columns likely include time-series measurements of PM concentrations from monitoring stations worldwide.
Lord Howe Island and Balls Pyramid shelves are classified by geomorphic features and shelf region. The dataset provides information on the size, extent, and type of features, including submerged fossil reefs, ridges, sandy basins, and paleochannels. It was created by visually interpreting and digitizing broad seafloor features in ArcGIS, extending upon prior work by Linklater et al. (2015).
335.0 MB of source data and original figures supporting a neuroscience study on Phf6 gene function in the medial preoptic area. The dataset, authored by Jingjie Wang and shared under CC-BY-4.0, includes files for figures 1-7 and supplemental figures 1-9. It was last updated on May 20, 2026.
An inventory of public information generated, obtained, acquired, transformed, and controlled by the Institución Universitaria de Barranquilla (IUB). The dataset includes columns for document series, format, language, description, and period. It was last updated on 2026-05-18 16:37:35 and is hosted on the Colombian open data portal www.datos.gov.co.
Raw evaluation metrics, execution telemetry logs, and structural syntax outputs from running the Mostly Basic Python Problems (MBPP) benchmark against the StarCoder2 7B base model. The dataset documents behavioral dynamics of mid-tier foundational weights in automated conversational evaluation workflows. It was authored by ShahzebKhoso and last updated on May 28, 2026.
22 records of refugee feedback collected by UNHCR in Kyrgyzstan in 2023. The dataset captures feedback on the quality, sufficiency, utilization, and effectiveness of cash-based assistance. UNHCR uses this Post Distribution Monitoring to improve the relevance and quality of support provided to Persons of Concern.
44 practitioners across Cyprus, Greece, and Portugal provide frontline insights into unowned cat population and welfare management. The qualitative analysis, authored by Jamie L. DeLeeuw and published in April 2026, examines systemic challenges like unreliable funding, fragmented support, and weak legal frameworks. Findings reveal shared issues of overpopulation and welfare harms, alongside country-specific variations in governance and implementation.
Geoscience Australia Data compiled this 30-meter resolution Digital Elevation Model (DEM) of bathymetry for Northern Australia in 2018. The dataset covers a continental shelf over 400 km wide and approximately 1500 km long, including coral reefs, sand cays, and slope canyons. Source data includes multibeam surveys, airborne LiDAR, satellite-derived bathymetry, and an intertidal elevation model, all edited and standardized to WGS84/MSL datums.