Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,732 datasets
Australia's Kimberley Marine Park contains a 30 m resolution bathymetry grid and derived morphological surfaces. The dataset uses a semi-hierarchical classification scheme dividing seafloor slope into three categories: Plain (<2°), Slope (2-10°), and Escarpment (>10°). Geoscience Australia developed this data to support the management of the Commonwealth marine park network.
Anonymized information on various requests received by the Citizen Service of the Colombian Agency for Territorial Renewal. The dataset includes 31 columns such as request type, dates, location, and demographics. It was published via the Socrata platform on datos.gov.co and was last updated on 2026-05-18.
Geoscience Australia produced this web service describing geomorphic features on the seabed within Australia's marine jurisdiction, which includes its Exclusive Economic Zone and offshore territories. Features were identified using the best available bathymetric data and are generally mapped at a scale of 1:5,000,000. The service was last updated on 2026-05-04.
Mudjimba Island and its 1.5km by 1.5km surroundings were surveyed for the Department of Environment, Tourism, Science and Innovation (DETSI) on 03 December 2024. Bathymetry data was acquired using a Reson Seabat T50P and processed into a 0.5m resolution GeoTIFF. The survey aimed to characterise the benthic environment and identify areas suitable for mooring block placement.
379 datapoints from 36 in vitro studies quantify how saponin extracts influence rumen fermentation and methane production. Agung Irawan published this meta-analysis in April 2026, identifying 24 distinct saponin sources. The analysis reveals source-dependent effects, with methane reductions ranging from 11% to 40.6%.
Multi-Doc-2025 is a financial question-answering benchmark built from SEC Form 10-K annual reports of S&P 500 companies. It was created by CageRico and last updated on Hugging Face in May 2026. The dataset is designed to evaluate retrieval-augmented generation (RAG) and financial QA systems under three reasoning settings: cross-company, cross-year, and hybrid-modal reasoning over text and tables.
SISCOSSR records community and collective activities in sexual and reproductive health for the Bucaramanga metropolitan area from January 2020 to December 2022. The dataset includes columns for activity type, participants, condoms distributed, and demographic details. It is hosted on the Colombian open data platform www.datos.gov.co.
Eddy covariance and meteorological measurements from the ALTAR South Eastern Queensland 1 flux tower site. Data are processed using EddyPro and PyFluxPro to produce a final, gap-filled product with Net Ecosystem Exchange partitioned into Gross Primary Productivity and Ecosystem Respiration. The site is an open pasture dominated by exotic grass and forb species.
Juliet Furaha Karisa's dataset on figshare, last updated May 8, 2026, contains thematic categories of challenges reported by coral reef restoration initiatives in the Western Indian Ocean. It includes numbers and percentages representing the frequency of reports across 49 surveyed initiatives. Illustrative examples summarize the types of challenges described by practitioners.
Juliet Furaha Karisa's dataset on figshare, last updated May 8, 2026, contains thematic categories of challenges reported by coral reef restoration initiatives in the Western Indian Ocean. It includes numbers and percentages representing the frequency of reports across 49 surveyed initiatives. Illustrative examples summarize the types of challenges described by practitioners.
Chongchao Zhang published a 61.0 MB ZIP file containing raw data and parameter settings used to generate a synthetic user behavior dataset for electric vehicle charging. The data supports the reproduction of statistical analyses and figures from the associated study. The dataset was last updated on May 15, 2026.
Almost 100 channels of resampled, calibrated data from the Low Energy Charged Particle (LECP) experiment on Voyager 2 during its Jupiter encounter. The data, provided by NASA, consists of time-averaged 'rate' measurements for electrons >26 keV and ions >30 keV, with full angular anisotropy information preserved over 15-minute intervals. The dataset requires conversion to physical units based on contextual analysis of particle mass species and background contamination.
NYSERDA deployed Emergency Generators and Transfer Switches at Retail Gas Stations as part of the Fuel-NY initiative. The dataset is a complete listing of all installations under the Gas Station Back-up Power Program and Permanent Generator Program, which ran from 6/1/2013 through 1/26/2019. It contains business names, locations, and installation types for participants in the downstate New York area.
Neural network models for predicting halogen-pi interaction energies across multiple biologically relevant aromatic systems. The dataset contains over 18 million interaction geometries for halobenzene-aromatic complexes, evaluated at the MP2/TZVPP level of theory. Models developed by Marc U. Engelhardt achieve high accuracy (R² > 0.98, RMSE < 0.5 kJ/mol) for targeted interaction domains.
SISCOSSR is a system for community and collective activities in sexual and reproductive health from Bucaramanga and its metropolitan area, covering January 2020 to December 2022. The dataset includes columns for municipality, institution, type of test, age, and counseling dates. It is hosted on the Colombian open data platform www.datos.gov.co.
Verbatim Spans is a multi-domain training dataset for query-conditioned extractive evidence selection. The dataset combines three sources covering distinct domains and annotation conventions, including a silver-standard subset of 20,916 training and 2,319 validation rows from NLP research papers. It was created by KRLabsOrg and last updated on Hugging Face in June 2026.
MineExplorer is a dataset for evaluating the open-world exploration capabilities of multimodal large language model agents in Minecraft. The dataset was created by researchers including Tianjie Ju, Yueqing Sun, and Zhuosheng Zhang. It was last updated on June 12, 2026.
An inventory of public information assets generated, obtained, acquired, transformed, or controlled by the Hospital Timbío E.S.E. The dataset includes columns for asset name, category, format, publication status, content description, language, and preservation medium. It is hosted on the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-18.
Calibrated observations from the MESSENGER spacecraft's Fast Imaging Plasma Spectrometer (FIPS), covering the energy/charge range of < 46 eV/q to 13 keV/q. The data set is produced by the National Aeronautics and Space Administration and contains eight FIPS data products. The dataset was last updated on 2026-04-10.
133,883 small molecules from the QM9 dataset, with properties calculated at the ωB97X-D/aug-cc-pVTZ level of theory. The dataset includes SMILES strings, molecular geometries, functional groups, vibrational spectra, and various electronic and thermochemical energies. It was authored by Anirudh Krishnadas and last updated on 2026-04 17.