Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,714 datasets
Voyager 1's Low Energy Charged Particle experiment measured electron and ion counting rates near Saturn. The dataset includes a subset of almost 100 LECP channels, providing 0.4-second high-resolution measurements during the far encounter. Data were collected by NASA and are globally calibrated to the extent possible.
FLORES-200 is a benchmark dataset for machine translation between English and low-resource languages. It was created by Facebook and doubles the language coverage of the earlier FLORES-101 dataset. The dataset is managed by the Open Language Data Initiative, with a newer version available as 'flores_plus'.
Western Australia coastal compartments provide a geological and landform-based framework for coastal planning. The dataset structures the coastline into primary, secondary, and tertiary compartments based on rock type, shoreline orientation, and landform associations. It was commissioned by the Department of Planning and compiled by Damara Pty Ltd and the Geological Society of WA, with a supporting report published in August 2011.
Consolidated budget execution data for expenditures generated during the 2019 fiscal year by the Mayor's Office of Leticia. The dataset includes 13 columns tracking budget allocations, obligations, and payments. It was published via the Colombian open data portal, datos.gov.co, and was last updated on May 18, 2026.
Southern Taiwan's Szekou Formation contains a fossil record of the Veneridae family of clams from the Late Pleistocene. The dataset documents 23 taxa across 17 genera, resulting from a revision of newly collected and previously held specimens. It was authored by Diana Osipova and published on figshare in April 2026.
From August 24 to August 27, 1989, this dataset contains resampled electron and ion counting rate data from the Low Energy Charged Particle (LECP) experiment on Voyager 2 during its encounter with Neptune. It includes scan plane angle distributions for periods when the instrument was mechanically scanning, with data averaged into 12.8-minute records. The data was collected by NASA's Voyager 2 spacecraft.
Almost 100 channels of electron and ion counting rate data from the Voyager 2 Low Energy Charged Particle experiment during its Jupiter far encounter. The National Aeronautics and Space Administration collected these 0.4-second high-resolution measurements of particles above 13 keV electrons and 24 keV ions. The data includes particles such as electrons, protons, alpha particles, and light, medium, and heavy nuclei.
Electron and ion counting rate and flux data from the Low Energy Charged Particle experiment on Voyager 2 during its encounter with Neptune. The dataset includes measurements from nearly 100 instrument channels, with data collected at a 3.2-minute cadence during the far encounter phase. The data were collected by NASA's Voyager 2 spacecraft.
Voyager 2's Low Energy Charged Particle instrument captured electron and ion counting rates near Uranus. The dataset includes nearly 100 calibrated channels measuring particles above 13 keV for electrons and 24 keV for ions, with data collected at 6.4-minute intervals during the encounter. NASA produced this dataset, which was last updated on March 13, 2026.
Electron and ion counting rate and flux data from the Low Energy Charged Particle experiment on Voyager 2 during its Uranus encounter. The dataset includes a subset of nearly 100 LECP channels measuring particles above 13 keV for electrons and 24 keV for ions. Data were collected by NASA at 6.4-minute intervals during the far encounter phase.
National Flood Hazard and Risk Maps for Wales, created to comply with EU Directive 2007/60/EC and the Flood Risk Regulations (2009). The maps cover three flooding sources—rivers, the sea, and surface water—and categorize risk to people, economic, and environmental receptors at a community scale. The dataset was published by the Government Digital Service under the OGL-UK-3.0 license.
VIIRS/NPP satellite data provides Leaf Area Index and Fraction of Photosynthetically Active Radiation measurements at a 500-meter resolution globally. This Version 2 product is designed for continuity with the MODIS LAI/FPAR algorithm and includes six science data layers for analysis, plus quality and standard deviation information. It features improvements in calibration, geolocation, and aerosol flag corrections over previous versions.
VIIRS sensor data from the NOAA-20 satellite provides 500-meter resolution maps of Leaf Area Index and absorbed solar energy for photosynthesis. This Version 2 product is algorithmically aligned with legacy MODIS data to ensure continuity for long-term Earth system studies. It includes six science layers for each measurement, plus quality and uncertainty flags.
Zhigang Zhang published a dataset on figshare in 2026 containing performance parameters for a hybrid-cloud resource scheduling algorithm. The data, stored in an XLS file, is 5.5 KB in size and compares the proposed EMPA-ASA algorithm against baselines like GA and PSO. Results show performance across QoS metrics including end-to-end delay, response time, throughput, and packet-loss rate.
9.5 KB of tabular data containing the parameters, symbols, and values used in the EMPA-ASA hybrid-cloud resource scheduling algorithm. Zhigang Zhang published this dataset on figshare in April 2026. The algorithm, described in the accompanying research, uses a queueing model and reinforcement learning to optimize cost and QoS metrics like delay and throughput.
Daniel Leightley's dataset contains features for modeling RationAI, a personalized AI-supported messaging framework tested to reduce alcohol consumption. The data originates from a 12-week feasibility study involving 2,871 UK Armed Forces veterans using the DrinksRation mobile app, with 343 participants in a personalized messaging group and 385 in a control group. The dataset was last updated in April 2026.
Documents generated by the Direccion de Medicamentos y Productos Biológicos related to procedures presented to Invima from 2015 onward. The dataset includes columns for document type, procedure start and end dates, filing number, and status. It is hosted on the Colombian open data portal, datos.gov.co.
Marius Pohl's dataset from 2026 documents lethal and sublethal effects of oral glyphosate-based herbicide exposure on two ant species, Camponotus maculatus and Cardiocondyla obscurior. It contains experimental results from 21-day exposure tests at field-relevant concentrations ranging from 0.5% to 10% GBH. The data supports research on non-target arthropod ecotoxicity.
A database of schools in the Colombian department of Sucre, provided by the Departmental Education Secretariat. The data is current as of November 2021 and includes administrative details for educational establishments. The dataset is hosted on the Colombian open data portal, www.datos.gov.co, and was last updated in May 2026.
A benchmark for evaluating Scientific General Intelligence across the full inquiry cycle, spanning 10 disciplines and more than 1,000 expert-curated samples inspired by Science's 125 Big Questions. The dataset, SGI-Reasoning-Lite, was created by InternScience and last updated on 2026-06-02. It employs an agentic evaluation framework for probing LLMs.