Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
41,396 datasets
Self-Prompt project data contains automatically discovered optimal system prompts for code generation large language models. The dataset includes base prompts, optimized soft-prompt tokens, and final assembled prompts for models evaluated on HumanEval, MBPP, and Eval_PLUS benchmarks. It was authored by liangdongzheng zhu and last updated on 2026-06-02.
NASA's Crustal Dynamics Data Information System provides a daily, 30-second granularity combination product of Global Navigation Satellite System satellite and receiver clock corrections. This dataset supports the International GNSS Service Real-Time Service, integrating data from GPS, GLONASS, Galileo, Beidou, QZSS, IRNSS, and SBAS constellations. The data is generated from real-time streams of a global receiver network and formatted according to the RTCM SSR standard for broadcast via the NTRIP protocol.
Yi Yang's research dataset on figshare contains the evaluation results of four large language models' responses to 25 public queries about the top five causes of under-5 mortality. The dataset includes scores for reliability, accuracy, completeness, comprehensibility, readability, and actionability, generated using tools like DISCERN, Likert scales, and PEMAT-P. It was last updated on May 5, 2026.
The Murray Basin in southeastern Australia contains subsurface geological data relevant to groundwater salinity management. The dataset likely contains stratigraphic and sedimentological analyses of mid-Tertiary permeability barriers, including formations like the Ettrick Formation, Winnambool Formation, and Geera Clay. It is hosted by the Australian Ocean Data Network and was last updated in May 2026.
Environment and Climate Change Canada's National Pollutant Release Inventory (NPRI) tracks annual pollutant releases, disposals, and transfers for recycling. The dataset provides ten years of totals broken down by province, industry, and substance, with releases categorized by media type (air, water, or land). Data is available in CSV format and was last updated in April 2026.
Integrated phenotypic, physiological, and transcriptomic data from a study comparing heat-tolerant and heat-sensitive alpine Rhododendron cultivars under chronic heat stress. The dataset, created by Mei Zhou and shared under CC-BY-4.0, includes results from a 30-day stress experiment, biochemical assays of the glutathione system, and time-course transcriptomics. It was last updated on May 19, 2026.
Australia's Kimberley Marine Park features 30-meter resolution bathymetric data and derived seafloor morphological surfaces. The dataset was created by Geoscience Australia using a two-part classification scheme to categorize seafloor slope into Plains, Slopes, and Escarpments. This work supports the management of Australia's network of 58 marine parks, which cover 3.3 million square kilometres.
Compound 14c is a mitochondria-targeted small molecule designed to disrupt pancreatic ductal adenocarcinoma energy metabolism. The dataset likely contains results from in vitro and in vivo experiments demonstrating inhibition of glycolysis and oxidative phosphorylation, induction of ferroptosis, and elicitation of immunogenic cell death hallmarks. The data was uploaded by Haibo Yan on figshare in May 2026.
Mudjimba Island and its 1.5km by 1.5km surroundings were surveyed for the Department of Environment, Tourism, Science and Innovation (DETSI) on 03 December 2024. Bathymetry data was acquired using a Reson Seabat T50P and processed into a 0.5m resolution GeoTIFF. The Australian Ocean Data Network published the dataset, which is not intended for navigational purposes.
A 2022 survey from March 24 to April 5 collected bathymetry data in the South-west Corner and Perth Canyon Marine Parks. The Minderoo-UWA Deep-Sea Research Centre acquired the data aboard the MV Pangaea Ocean Explorer using a Kongsberg EM304 multibeam sonar. The processed dataset includes 64m and 128m resolution GeoTIFF files derived from the sonar data.
NASA's SODAR dataset maps backscattered acoustic energy to detect temperature fluctuations and thermal inversions in the lower atmosphere. The data provide estimates of the mixed layer height and inversion dimensions, collected during the 1987 FIFE field campaigns. Measurements were taken from a single, vertically pointing sounder operating at approximately 1500 Hz in the northwest quadrant of the study area.
Jian Chen's dataset provides the temporal and spatial distribution of lunar mare basalts. The 22.5 MB collection, last updated in April 2026, includes geological contacts and absolute model ages compiled from Lunar Reconnaissance Orbiter, Kaguya, and Clementine mission data. It is formatted as ESRI Shapefiles for geospatial analysis.
89.16% leaching efficiency was achieved for rare earth elements using microwave-assisted deep eutectic solvents. The dataset likely contains experimental results from a study by Changquan Men, published on figshare in May 2026. It describes a process using three synthesized solvents to recover strategic materials from industrial byproducts.
VIIRS/NPP sensor data provides 16 spectral bands spanning visible, near-infrared, and thermal wavelengths from 0.412 to 12.1 micrometers. Each product captures a 6-minute swath of Earth observation data with a spatial resolution of approximately 750 meters at nadir. This near real-time (NRT) Level 1B product supports immediate environmental monitoring and hazard detection.
Polygon features represent proposed parcels from subdivision applications approved through the NSW Planning Portal. Cadastre NSW provides a near real-time digitisation service integrated with the portal since September 2022. DCS Spatial Services aggregates this data to generate a layer visualising subdivision development progression across New South Wales.
San JosΓ© de CΓΊcuta, Colombia, collects revenue from a property value-added contribution for public works. The dataset likely contains payment records from property owners/tenants in the urban zone, based on a feasibility study from 2017 and enacted by municipal resolution in 2018. Data collection covers the fiscal periods from 2018 to 2027.
A customized vector tile basemap layer, last updated 2026-05-27, provided by the City of Moreton Bay's Data Hub. It is optimized to display special areas of interest (AOIs) created by community contributors, including landscaping and sports amenities. The layer is built using the same data sources as Esri's World Topographic Map.
A PDF document containing a 2-case series published on figshare. The material details the use of quadrant-asymmetric scleral contact lenses in two elderly patients with concurrent facial nerve palsy and keratoconus. It was last updated on May 20, 2026.
A 2.2 GB dataset from 2026 contains equilibrated input structures and simulation scripts for hydroxyapatite (HAp) surfaces. It was created by Mahdi Tavakol at the University of Oxford. The data supports steered molecular dynamics, thermodynamic integration, and uniaxial deformation tests for surfaces at pH 5 and 7.
Over 3,000 sediment samples from Geoscience Australia's MARS database underpin this regional synthesis of the Great Barrier Reef seabed. This analysis provides the first regional update of surface sedimentology and geomorphology since the pioneering work of the late 20th century. It offers a systematic characterisation of inter-reefal environments, which comprise 95% of the Marine Park area.