Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
40,759 datasets
ARWPIC Right Whale sighting data summaries were compiled for the NESP MBH project A13 from the Australian Right Whale Photo Identification Catalogue. The dataset contains sight and resight summaries used to analyze population trends and spatial connectivity of individuals across southern Australia. Original sightings data were collected by ARWPIC partners between 1990 and 2018, and the catalogue is managed by the Australian Marine Mammal Centre at the Australian Antarctic Division.
Legacy deposit of long-form English travel articles published on ThatBackpacker.com by Audrey Bergner. The 13.8 MB ZIP archive contains article records, CSV and JSONL exports, and documentation files from an earlier version of the corpus. This record was last updated on 2026-05-31 and is retained for historical version tracking.
A 2026 report from the Government of Yukon provides a review of global diamond production and exploration potential. The document details the geological, geochemical, and geophysical context for diamond exploration in Yukon and Canada's Northwest Territories, including a case study of the Point Lake pipe discovery. It discusses global diamond market statistics from 1950 to 1990 and the economic rationale for exploration in North America.
NASA's Suomi NPP satellite collects this Level 1B swath product every six minutes. The Day/Night Band sensor captures panchromatic data from 500-900 nm, enabling observation from daylight down to low-light nighttime radiation. Its on-board calibration and stray-light corrections provide radiometrically calibrated data at approximately 750-meter spatial resolution at nadir.
Parcel point data from Maryland integrates property ownership, address, valuation, and land structure information for tax assessment accounts. Data originates from the State Department of Assessments and Taxation and the Maryland Department of Planning. The dataset includes fields for Census 2020 Block Group, Owner Occupied Indicator, Year Built, Structure Square Footage, and Land Use Description.
A figshare dataset by Jun Chen, last updated in May 2026, presents quantitative results for an unsupervised image stitching method. The 5.5 KB Excel file likely contains SSIM (Structural Similarity Index Measure) scores comparing the proposed method against state-of-the-art techniques. The data supports the paper's claim of up to a 56.00% improvement in SSIM for UAV image stitching.
Jun Chen published a dataset on figshare in May 2026 containing performance metrics for an unsupervised UAV image stitching method. The data likely contains tabular results comparing the proposed approach against state-of-the-art methods on four UAV image datasets. The proposed method reportedly improved PSNR by up to 28.58% and SSIM by up to 56.00%, while reducing processing time by up to 88.9%.
A 5.5 KB Excel file contains the model results for a cost-effectiveness analysis comparing two first-line treatments for advanced hepatocellular carcinoma. The analysis, authored by XueYin Xu and uploaded to figshare, uses a three-state partitioned survival model over a 10-year horizon with a 4.5% discount rate. It was last updated on May 14, 2026, and concludes that a dual-agent therapy is not cost-effective from the perspective of the Chinese healthcare system.
A 10-year partitioned survival model compares the cost-effectiveness of two first-line treatments for advanced hepatocellular carcinoma from the perspective of the Chinese healthcare system. The dataset, authored by XueYin Xu and last updated in May 2026, contains parameters and results from a health economic analysis based on a Phase III clinical trial. It includes total costs, quality-adjusted life years (QALYs), and incremental cost-effectiveness ratios (ICERs) for a dual-agent therapy versus sorafenib.
XueYin Xu published a dataset on figshare in May 2026 containing results from a partitioned survival model for advanced hepatocellular carcinoma. The 5.5 KB Excel file compares the cost-effectiveness of a dual-agent therapy versus sorafenib from the perspective of the Chinese healthcare system. The analysis uses a 10-year time horizon and a willingness-to-pay threshold of 299,400 CNY.
A 68.5 MB dataset of 3D point clouds for manufacturing defect inspection, released by Xiaoyang Song on figshare in May 2026. It supports research into detecting new, unseen defect types that are out-of-distribution from training data. The dataset is used to validate a novel generative adversarial approach for supervised OoD sample generation.
From November 12 to December 19, 2015, this dataset was collected during the Olympic Mountains Experiment (OLYMPEX) field campaign in the Pacific Northwest. It contains multi-frequency Doppler radar measurements from the APR-3 instrument aboard a DC-8 aircraft at 10 km altitude. The data includes radar reflectivity, Doppler velocity, linear depolarization ratio, and normalized radar cross-section measurements at Ku, Ka, and W bands.
An automated, high-precision evaluation benchmark designed to establish programmatic safety guardrails for high-dimensional language model systems. The framework provides an inline, machine-readable validation layer that maps metrics against a fixed 1,300-region human neurological reference framework. It was authored by Jamie Davis and last updated on 2026-05-20.
A 54.1 KB dataset by Jamie Smith, last updated in May 2026, accompanies a study on the lasting impacts of sublethal prepupal heatwaves on the solitary bee Osmia bicornis. It contains raw data and R code for reproducing analyses on male and female reproductive traits and offspring survival. The dataset includes experimental temperature regimes, sperm and oocyte measurements, and survival records.
Processed experimental and simulation data supports a study on passive dimples for Flettner rotors. The 994.6 KB dataset, authored by Songhan Mo and last updated in May 2026, includes validation, parametric, and response-surface analysis files. Raw three-dimensional field files are excluded due to their large size.
Conteo de Procesos V2 is a dataset of criminal case counts from Colombia's Oral Accusatory System (SPOA) under Laws 906 of 2004 and 1098 of 2006. It is published by the Fiscalía General de la Nación (National Prosecutor's Office) via datos.gov.co and was last updated on 2026-05-18. The data covers crimes occurring since 2010 and is updated monthly.
Monitoring results from the Regional Autonomous Corporation for the Defense of the Bucaramanga Plateau (CDMB) assess the physical, chemical, and microbiological state of surface water sources. The dataset includes parameters like temperature, pH, dissolved oxygen, turbidity, total solids, BOD5, COD, nutrients, heavy metals, and coliforms. Data was collected during the September–October 2024 monitoring campaign as part of the institutional environmental surveillance and control strategy for 2024.
Two randomized online experiments with 1,260 and 2,521 adult participants conducted between March and June 2024 examined how packaging imagery and descriptors influence perceptions of cannabis edibles. The study, preregistered on ClinicalTrials.gov, measured outcomes including product appeal, perceived safety, healthiness, and appeal to children. Results indicate flavor imagery increased appeal and perceived safety, while descriptors like 'all natural' or 'organic' reduced interest.
MCD18A2 Version 6.1 is a decommissioned NASA MODIS data product providing global, daily estimates of Photosynthetically Active Radiation (PAR) at 1-kilometer pixel resolution. The dataset combines observations from the Terra and Aqua satellites to produce instantaneous and 3-hourly PAR arrays using a look-up table algorithm accounting for aerosols and clouds. It was produced by the LPCLOUD organization and superseded by Version 6.2 as of June 1, III.
A pre-registered experimental study compares 50 active professional visual artists with a matched laypeople sample on tasks using text-to-image AI. The study assessed copying accuracy and creative thinking, finding a small but detectable advantage for artists. The dataset includes results from this experiment, which was conducted by Thomas F. Eisenmann and last updated in May 2026.