Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
41,939 datasets
GEMSS-Driven Subsampling for Information Extraction and Redundancy Elimination is a methodological dataset and R package by Ming-Chung Chang, last updated in May 2026. The 6.6 MB resource includes PDF and ZIP files describing a subsampling approach to improve Gaussian process model accuracy in unexplored input regions. The method, Generalization Error Minimization in SubSampling (GEMSS), aims to identify informative data subsets while discarding redundant points.
12.6 MB of synthetic benchmark datasets and code accompany the manuscript 'From Data Chaos to Physically Interpretable Deterministic Mapping'. The repository includes data for Burgers', FitzHugh-Nagumo, Navier-Stokes, and CSTR systems, authored by Dongni Jia and last updated in May 2026. Real-world industrial data for a grinding-classification plant are noted as confidential and not included.
LLM-Assisted Construction of Rule-Aligned Cross-Language Datasets for Software Quality Analysis is a replication package built from the Python Software Quality Dataset. It provides a pipeline to extract, clean, and translate Python functions into Ruby and JavaScript, then validates translations and measures issue preservation. The package, authored by Patcas Rares and last updated on 2026-05-19, includes scripts and cached outputs for reproducibility.
Eight bathymetry mosaics and derivatives provide full coverage of the Australian Exclusive Economic Zone, stratified by Parks Australia Management Effectiveness Ecosystem Component depth zones. The mosaics were created using a systematic prioritization of all publicly available bathymetry data as of July 2024, including multibeam, singlebeam, satellite, and seismic sources. Data is available as bathymetry, hillshade, slope, and aspect composites, with resolutions ranging from 10 meters in shallow zones to 210 meters in the deepest zones.
Yaxin Teng published a dataset on figshare on 2026-04-29 describing the design and synthesis of USP1 inhibitors. The data likely contains results from in vitro assays and in vivo xenograft studies demonstrating synergistic antitumor activity with PARP inhibitors in triple-negative breast cancer models. The dataset is 4.5 KB in size and is available in CSV format under a CC-BY-NC-4.0 license.
TG-GAN is a deep learning model for multi-rate Digital Elevation Model generalization, constrained by terrain morphological factors. The model was developed by 岩晨 万 and last updated on 2026-05-27. Experiments were conducted in mountainous regions of Chongqing, China, Alaska, and Colorado with downscaling factors from 2× to 5×.
A dataset from a 2026 figshare study by Jiaying Wei on enzyme-activatable peptide-drug conjugates. It details a γ-glutamyltransferase-responsive peptide platform for targeted cancer therapy. The data likely contains results on cytotoxicity, internalization efficiency, and in vivo tumor growth suppression.
The Forster, Cape Hawke to Black Head, bathymetry survey was acquired by the NSW Department of Planning and Environment between 27 February 2019 and 14 October 2020. It provides 32-bit floating point geotiff files of bathymetry and backscatter data at 5-meter resolution, processed using Hypack, R2Sonic GUI, POSPac, Qimera, and FMGT software. The dataset was created to establish a baseline and map seabed type distribution as part of the SeabedNSW program.
The Notaría Quinta de Cartagena was created in 1994 and serves the southern zone of Cartagena, Colombia. This dataset is an index of information classified as confidential or reserved by this notary office, detailing the legal basis and responsible parties. It includes columns such as FECHA DE GENERACIÓN DE LA INFORMACIÓN, FUNDAMENTO JURIDICO DE LA EXCEPCIÓN, and PLAZO DE LA CLASIFICACIÓN O RESERVA.
Stratigraphic framework maps for the Saskatchewan Phanerozoic Fluids and Petroleum Systems project were produced using 2 km equi-spaced modified grids generated from a kriging algorithm. The dataset integrates and validates data from multiple regional projects completed by the Saskatchewan Ministry between 2003 and 2009. To minimize edge effects, stratigraphic data from wells in adjacent jurisdictions was also incorporated.
32 infants from a larger cohort of 90 FGR/SGA pregnancies had their final prenatal ultrasound parameters correlated with Bayley-III neurodevelopmental scores. This pilot study found preliminary associations, such as abdominal circumference with motor scores (r=0.516) and umbilical vein flow with language scores (r=0.545). The dataset, shared under CC-BY-4.0, is a 184.9 KB DOCX file from figshare, last updated in May 2026.
Bryozoan faunas were not readily located in many areas of New South Wales despite abundant marine fossils. The work investigates Middle and Upper Ordovician exposures in central-western New South Wales and Middle and Upper Devonian sequences in the Fitzroy Basin. Many described samples are isolated occurrences yielding two or three species.
Flinders Reefs and Cairns Seamount in the Coral Sea Marine Park, north-eastern Australia, are mapped for seabed morphology and geomorphology. The maps were produced using a two-step classification system applied to bathymetry DEMs compiled from multibeam, LADS, LiDAR, and hydrographic surveys. The data product is hosted by the Australian Ocean Data Network and was last updated in May 2026.
A 2026 study by Whad Fayed on figshare investigated the effects of lysophospholipid and lipase supplementation in broiler chickens. The dataset likely contains results from 300 one-day-old male Ross 308 broilers across five treatment groups, measuring growth performance, nutrient digestibility, and metabolic responses. Findings include body weight gain of 2,222 grams and feed conversion ratios of 1.344 in supplemented groups.
5,985 adults from the Dalian Health Management Cohort (2015–2023) were tracked for carotid plaque development over a mean follow-up of 2.24 years. The study by Jingshan Jiang found a nonlinear positive association, with a 64% increased risk for those in the highest quartile of the non-HDL to HDL cholesterol ratio. This 800 KB PDF contains the underlying data for this retrospective cohort analysis.
This data product contains seabed morphology and geomorphology maps for a subset area of Zeehan Marine Park in south-eastern Australia. The maps were derived from a 2-meter resolution bathymetry DEM compiled from a multibeam survey and classified using a nationally consistent scheme. The Australian Ocean Data Network hosts the data, which was last updated on 2026-05-05.
Beagle Marine Park in south-eastern Australia contains geospatial seabed morphology and geomorphology data. The dataset was created using a two-step classification system applied to bathymetry digital elevation models, supplemented by backscatter intensity, seabed imagery, sediment samples, and sub-bottom profiles. It is published by the Australian Ocean Data Network.
NASA's DC-8 aircraft collected detailed in-flight meteorological data during the 2006 African Monsoon Multidisciplinary Analyses campaign. The dataset captures measurements from a specialized Meteorological Measurement System, including air velocity, temperature, pressure, and aircraft attitude over West Africa and the Cape Verde Islands. This field investigation aimed to characterize African Easterly Waves and Mesoscale Convective Systems.
The Western Port Local Coastal Hazard Assessment provides modeled data on the extent of shoreline inundation for the Western Port coastal environment in Australia. This specific dataset represents the inundation extent for a 10% Average Exceedance Probability catchment-generated flood under a +20cm sea level rise scenario. The data was produced by the Department of Energy, Environment and Climate Action and was last updated on 2026-04 09.
The Bureau of Mineral Resources Bulletin presents results from a geological reconnaissance of the northwest Australian continental shelf. Data was gathered during two 3-month cruises in late 1967 and 1968, supplemented by seismic profiles, echograms, and hydrographic soundings. The survey covers a 1200 km region from Barrow Island to beyond Scott Reef, aiming to map sediment distribution and elucidate late Cainozoic geological history.