Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,799 datasets
355 autocomplete tasks across 13 categories measure language model performance on real-world Next.js, React, and TypeScript code. NextBench uses deterministic checks like must-contain patterns and regex matches for scoring, avoiding subjective LLM judges. The benchmark was created by baablabs and was last updated on 2026-06-06.
Geoscience Australia produced a backscatter grid covering 1034 square kilometers of the Lord Howe Island shelf. The data originates from a 2008 marine survey (SS062008) that mapped seabed bathymetry and characterized benthic environments using sediment sampling, rock coring, and underwater video. The grid was processed from EM300 backscatter data using the CMST-GA MB Process.
Voyager 1's Low Energy Charged Particle experiment recorded electron and ion intensities during its close encounter with Jupiter. The instrument measured in-situ charged particles with energies above 15 keV for electrons and above 30 keV for ions, including protons, alpha particles, and heavier nuclei. This dataset from the National Aeronautics and Space Administration was last updated in April 2026.
Research data supporting the article 'Accurate Energies for ΟΟ* Excited States via Exchange Scaling: the XS-CASSF method'. The 135.0 MB dataset contains computational results for a set of investigated molecules, including hexatriene and pqdm, comparing methods like CASSCF, XS-CASSCF, CAM-B3LYP, and SCS-CC2. It was authored by Felix Plasser and last updated on 2026-04-14.
A 2000/2001 regional seafloor mapping study by Geoscience Australia's South and Southwest Regional Project. It delineates four major geomorphological features and defines five acoustic echo facies for the Great Australian Bight area, captured in a GIS. The work aimed to support future Regional Marine Planning by providing foundational information for biological, environmental, and economic assessments.
HSM is a dataset of manually annotated support regions for 3D furniture meshes from the Habitat Synthetic Scenes Dataset (HSSD). The dataset, introduced in the paper 'HSM: Hierarchical Scene Motifs for Multi-Scale Indoor Scene Generation', also includes HSM-generated scenes for the SceneEval-500 benchmark. It was created by the author 3dlg-hcvc and was last updated on June 2, 2026.
Harris Greenstone Domain (HGD) is a GIS dataset depicting a late Archean-Proterozoic tectonostratigraphic terrane in the Gawler Craton of South Australia. The data characterizes the Harris Greenstone Belt, including komatiite, basalt, and banded iron formation, with structure interpreted from aeromagnetic and gravity surveys. It highlights the Lake Harris Komatiite for its potential nickel-copper-PGE and gold mineralizing systems.
Nemotron-SFT-CUDA-v1 is a training dataset for CUDA code, created by NVIDIA. It contains synthetic CUDA programming problems and solutions generated by AI agents based on permissively licensed source code from the Nemotron Pretraining Code v2 dataset. The dataset was last updated on June 4, 2026.
Seven determined trilobite taxa are described from the youngest Late Cambrian assemblage discovered in the Mariner Group of northern Victoria Land, Antarctica. The data originates from a paper published by Geoscience Australia, last updated on the platform in April 2026. This fauna is related to material previously described from Kazakhstan, Siberia, north China, Australia, and North America.
Maximum snow cover extent and snow depth estimates for each 8-day composite period from 2001 to 2017 across Alaska at 1 km resolution. The dataset was produced by NASA using a downscaling scheme that incorporates MODIS snow cover data and MERRA-2 reanalysis data. It covers the majority of Alaska's land area, excluding perennial ice/snow or open water.
37 marine physical environmental variables were collated by the Marine Biodiversity Hub for surrogacy and predictive modelling research. Bathymetry, geomorphology, seabed sediment, and seabed exposure data were produced by Geoscience Australia, while bottom-water and surface-water parameters were produced by CSIRO. All data were transformed to a common WGS84 datum and a 0.01-degree grid.
Peng Wang published a dataset on figshare in April 2026 detailing a nanoplatform for multimodal cancer therapy. The dataset, 338 bytes in size, likely contains experimental parameters or results related to the self-amplifying therapeutic cycle described. It is licensed under CC-BY-NC-4.0.
2015β2019 data from Fuzhou, China, linking other infectious diarrhea (OID) cases with meteorological factors and sulfur dioxide levels. The dataset was created by Jiangwang Fang and last updated in April 2026. It was analyzed using quasi-Poisson and distributed lag non-linear models to study lagged and interactive effects.
Geoscience Australia classifies the geomorphological features of the Great Artesian Basin, including its offshore extents beneath the Gulf of Carpentaria. The dataset groups features into five categories based on depositional environment: Marine, Fluvial, Aeolian, Playa-lacustrine, and Erosional terrain. It is associated with the Hydrogeological Atlas of the Great Artesian Basin and other scientific records.
A 2026 dataset of 27 unique Curriculum Vitae documents designed for research on indirect prompt injection in Large Language Models. Created by Alfredo Milani and colleagues, it was used to evaluate LLM robustness and threat generalizability in a controlled experimental framework. The dataset systematically varies document length and content density to isolate factors affecting model vulnerability.
New York City Department of Buildings data on the life cycle of permits for construction and demolition activities. Each row represents one permit for a specific work type, with records updated daily to reflect the latest application status. The dataset is sourced from the city's Buildings Information System (BIS).
Reporte de Tramites de Licencias de ConducciΓ³n Sucre tracks driver license application procedures in the Sucre department of Colombia. The data likely contains applicant document details and request statuses for transactions recorded between January and October 2025. It is published by the Colombian open data portal www.datos.gov.co and was last updated in May 2026.
A collection of paper summaries and abstracts from a Geoscience Australia publication focused on models of sulphide ore formation in sedimentary rocks. The content likely includes discussions on lead-zinc deposits, copper mineralisation, and diagenetic processes. The dataset was last updated on the platform in April 2026.
45 radiocarbon dates from 11 sites track sea-level changes over the last 6000 years in the Great Barrier Reef. The data, produced by Geoscience Australia, includes storm ridge sequences from 5 locations to analyze climatic event frequency. Results indicate sea level fell smoothly from +1 meter to present levels with no evidence for secondary oscillations or climatic changes in the last 6000 years.
InΓ©s Ochoa-Arizu published a dataset on figshare in April 2026 detailing batch fermentation yields from glycerol using an extremophile bacterium. The data likely contains optimal molar yields for hydrogen (0.94 mol/mol-glycerol), 1,3-propanediol (0.66 mol/mol-glycerol), and ethanol (1 mol/mol-glycerol) under varying glycerol concentrations and temperatures. The study focuses on the metabolic conversion capabilities of Citrobacter telavivensis T1.2D-1, isolated from the Iberian Pyrite Belt.