Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,712 datasets
An inventory of public information generated, obtained, acquired, or controlled by the Municipal Mayor's Office of Capitanejo, Colombia. The dataset includes metadata on document classification, legal justifications, and responsible officials. It was last updated on the platform in May 2026.
TestGenius AI is a dataset for AI-powered test case generation. It describes a multi-agent pipeline that generates, validates, and iteratively improves tests using mutation testing feedback, inspired by the MuTAP research from ISSTA 2023. The dataset was created by Muthukumarank and was last updated on 2026-05-24.
1986 data from the First ISCCP Regional Experiment (FIRE) Cirrus field campaign, collected by the National Center for Atmospheric Research (NCAR) Sabreliner aircraft. The dataset contains meteorological and radiometric measurements, including aircraft position, static pressure, temperature, humidity, and radiation irradiances across shortwave, near-infrared, and infrared spectra. It was published by the National Aeronautics and Space Administration.
A bilingual English and Russian instruction-tuning dataset for training language models to analyze on-chain attacks, DeFi exploits, and crypto security incidents. It was created by author z0n3x and is grounded in the OAK taxonomy v0.1, a structured knowledge base of adversary tactics, techniques, and real-world incidents.
North Atlantic data from the Woods Hole Oceanographic Institution's Ocean Twilight Zone program, collected during a 2021 field campaign. The dataset likely contains measurements related to ocean chemistry, optics, and temperature, focusing on the twilight zone ecosystem's role in the carbon cycle and climate. The program partnered with NASA EXPORTS for sampling.
Estudiantes Universidad del Quindío contains data on enrolled, admitted, and matriculated students from the University of Quindío in Colombia. The dataset includes variables such as gender, socioeconomic stratum, department of origin, school of origin, health insurance provider, and others. It was published on the Colombian open data portal www.datos.gov.co and last updated on 2026-05-18.
OVCF - SGR - Ejecución de Gastos provides information on the programming and execution of income and expenses associated with Colombia's General Royalties System (Sistema General de Regalías). The dataset is hosted by www.datos.gov.co and was last updated on 2026-05-18. It contains columns detailing budget items, entities, third parties, and financial execution stages like commitments, obligations, and payments.
Forster, Cape Hawke to Black Head in New South Wales seabed was surveyed by the NSW Department of Planning and Environment between 27 February 2019 and 14 October 2020. The dataset contains 32-bit floating point geotiff files of bathymetry and backscatter data in 5-meter resolution, derived from multibeam sonar and processed with Hypack, Qimera, and FMGT software. It was created to provide a baseline dataset and map the spatial distribution of seabed types under the SeabedNSW program.
Australia's northwest marine region is the focus of a four-dimensional (3D × time) biophysical dispersal model simulating larval movement. The model, developed by Kool and Nichol in 2015, handles massive numbers of simulated larvae with diverse life histories and outputs point-level data to a relational database. Results include animations of larval movement near the Gascoyne canyon, dispersal surfaces over depth and time, and matrices of connectivity values among Commonwealth Marine Reserves.
FITS tables for cone reflection used with the XSPEC model stokes_cone v2.0. The 2.1 GB dataset was created by Jakub Podgorný using torus_integrator and an XSPEC table model generator, with details published in Podgorny et al., 2024 and Podgorny 2025. It was last updated on May 14, 2026.
SMEX02 Landsat Thematic Mapper Imagery, Iowa, Version 1 provides false-color composite images from Landsat 5 and 7 satellites. The dataset was developed by the National Aeronautics and Space Administration for the Soil Moisture Experiment 2002 (SMEX02). It is available in BIN, ISO, and HTML file formats.
Voyager 2's Low Energy Charged Particle instrument recorded electron and ion intensities during its 1986 Uranus encounter. The dataset covers a three-day period from January 24 to 27, 1986, and includes angle-resolved measurements from eight directional sectors during instrument scans. NASA produced this resampled data record, which captures two distinct mechanical scanning modes used during the flyby.
UAEM1LMT_002 is a subset of the Multi-angle Imaging SpectroRadiometer (MISR) Level 1B2 Local Mode Terrain Radiance Data for the United Arab Emirates region. The dataset contains terrain-projected top-of-atmosphere radiance from a single local mode scene, resampled at the surface and topographically corrected. It was produced by the National Aeronautics and Space Administration and last updated in March 2026.
Christmas Island lies about 1600 km north-north west of Australia's Northwest Cape. The dataset likely contains bathymetric and sediment thickness maps compiled from seismic profiles and bathymetric data collected in February 1992, totaling about 2000 km of seismic lines. The data was used to produce a new 1:1,000,000 scale bathymetric map for the Australian Geological Survey Organisation.
Fifteen anonymized semi-structured interview transcripts collected for a doctoral study examining parasocial attachment and consumption. The dataset explores the lived experiences of Generation Z female fans in China regarding their engagement with social media influencers, emotional attachment, and related purchase behaviors. It was created by LUYANG LI using Interpretative Phenomenological Analysis and shared on figshare in April 2026.
Yukon Regional Mineral Potential by Deposit Models 2003 is a collection of PDF maps assessing mineral potential across Yukon. The dataset includes 18 individual deposit model maps, a methodology report, and an index map and table. It was produced by the Government of Yukon and last updated on 2026-04-17.
2214.350 kilometers of marine seismic data was processed for the Bureau of Mineral Resources, Geology and Geophysics. The report details the processing sequence and techniques used by Digital Exploration Ltd between July and December 1991. It includes appendices listing line numbers, SPN ranges, and a location map.
Geoscience Australia Data provides a geological treatise on the buoyant cratonic platform of Phanerozoic Australia, inherited from the Gondwanaland supercontinent. The document discusses the role of Pan-African orogenic heat and mafic underplating in creating a permanently buoyant lower crust, contrasting it with the marine facies of Laurasia. It was last updated on 2026-04-30.
125 gravity provinces have been defined and named across Australia and its continental margins based on land and marine reconnaissance gravity surveys. Geoscience Australia Data compiled this dataset to rationalize province boundaries and names after achieving virtually complete gravity coverage. The dataset provides a standardized nomenclature for discussing regional gravity features.
Geoscience Australia Data provides a geological analysis of the central Great Barrier Reef Province derived from shallow, intermediate, and deep focus seismic reflection profiling. The dataset describes the Cainozoic evolution of the region, detailing depositional episodes from the Late Cretaceous to the Pleistocene, including periods of reef growth and erosion linked to sea-level changes. The data is available in PDF and HTML formats and was last updated on April 30, 2026.