Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
43,370 datasets
Australia's National Environmental Science Program (NESP) compiled a spatial index of environmental literature for 100 threatened and migratory marine species in relation to Offshore Renewable Energy (ORE) areas. The inventory records study locations, methodologies, and potential impacts of ORE infrastructure on species like birds, cetaceans, and turtles. Data was sourced from a systematic literature review and observation repositories including BirdLife Australia and the Atlas of Living Australia.
Nemotron Personas Belgium uses a compound AI approach to generate multilingual Belgian personas. The personas are grounded in real-world distributions, suggesting they model demographic or behavioral patterns. Created by NVIDIA and last updated on June 17, 2026, this dataset is intended for AI training tasks.
ASR-KCSC is an open-source Korean conversational speech corpus containing 5.22 hours of transcribed audio. The data consists of 22 conversations between seven pairs of speakers recorded on mobile devices in indoor environments. Author MagicHub released the dataset on Hugging Face, with a last recorded update in June 2026.
Chimera-XTRM is a synthetic dataset engineered for fine-tuning Large Language Models in advanced Red-Team operations. The dataset was created by author Umranz and was last updated on June 21, 2026. It is intended strictly for authorized security research and defensive training.
A hierarchical framework of geomorphological spatial entities at three tiers, with Tier 1 containing 8 Divisions, Tier 2 containing 34 categories, and Tier 3 containing 95 categories. The dataset, created by the Department of Energy, Environment and Climate Action, provides a spatial system to assist planning, monitoring and reporting for natural resource management in Victoria and Australia. It was last updated on 2026-04-08.
85 variables provide fire detection and retrievals of Fire Radiative Power (FRP), fire Visible Energy Fraction (VEF), and Modified Combustion Efficiency (MCE). The NASA/NOAA Suomi NPP VIIRS FILDA-2 product is generated in 6-minute orbit segments at a 750-meter spatial resolution, designed to detect smaller and cooler fires using visible band observations at night. This dataset supports analysis of fire characteristics and combustion efficiency globally.
Global satellite-derived data provides cumulative 8-day composites of Gross Primary Productivity (GPP) and Net Photosynthesis (PSN) at a 500-meter spatial resolution. The dataset, based on the radiation use efficiency concept, is designed as an input for models calculating terrestrial energy, carbon, and water cycle processes. It contains three primary variables for GPP and PSN alongside a quality control layer.
Alberta, Canada, contains wetland inventory data for four pilot study areas totaling approximately 39,045 kmยฒ. The Government of Alberta, Ducks Unlimited Canada, and Alberta Biodiversity Monitoring Institute collaborated to develop this inventory using Earth Observation imagery and machine learning techniques. The dataset identifies wetland class and form according to the Alberta Wetland Classification System.
Seasat-A Scatterometer (SASS) data provides monthly averaged ocean surface wind stress from July to October 1978. The data is gridded on a 2.5-degree global grid, with vector wind stress stored in dynes per square centimeter. It is derived from 96 days of SASS vector winds processed to remove directional ambiguities using a GSFC atmospheric model.
133 variables across 2,050 cases capture key indicators of library services in Canada. The data were collected for the survey years 1994, 1995, 1996, and 1999 by the National Library of Canada in collaboration with library associations. The program was dissolved after the publication of the 1999 statistical report in 2002.
415,090 line-kilometres of Total Magnetic Intensity data were acquired over the Narryer region in 2024. The dataset is processed with corrections for diurnal variation, geomagnetic reference fields, and levelling to highlight subsurface geology. It was published by Geoscience Australia Data and last updated in May 2026.
Total Magnetic Intensity (TMI) point-located data measures variations in the Earth's magnetic field. This line dataset from the Narryer survey in Western Australia was acquired in 2024 by the WA Government and consists of 415,090 line-kilometres of data. The raw edited data includes measurements such as raw TMI, compensated TMI, diurnal, fluxgate magnetometer, raw altimeter heights, and ellipsoidal GNSS heights.
NASA/NOAA's VIIRS/NPP Land Surface Temperature/Emissivity Daily L3 Global 1km SIN Grid Day V002 dataset provides daily, gridded estimates of land surface temperature and emissivity. The product is compiled from daytime VIIRS swath data, resampled to a 1-kilometer sinusoidal grid, and uses an algorithm compatible with MODIS for continuity. It contains seven science datasets including LST, quality control, emissivity for three spectral bands, view zenith angle, and observation time.
A study of 200 dark septate endophytic (DSE) fungal strains isolated from Ulmus pumila L. roots across three sandy lands in eastern Inner Mongolia. The dataset includes fungal species composition, colonization rates, and key rhizosphere soil nutrient measurements. It was authored by Yunxia Ma and last updated in April 2026.
Global daily land surface temperature and emissivity data at a 1-kilometer resolution, derived from NOAA-20 VIIRS satellite observations. The dataset is produced by averaging multiple cloud-free, high-accuracy observations per grid cell, weighted by observation coverage, and is algorithmically compatible with NASA's MODIS products for continuity. It contains seven science datasets including temperature, quality control, emissivity for three spectral bands, view angle, and observation time.
Comparendos aplicados por el Cรณdigo Nacional de Seguridad y Convivencia Ciudadana records infractions under Colombia's National Security and Citizen Coexistence Code. The dataset is hosted on the datos.gov.co platform via Socrata and was last updated on 2026-05-18. It likely contains records of official orders issued by the National Police.
500 six-panel comic strips generated with OpenAI's gpt-image-1, totaling 3,000 images. Each strip is paired with structured metadata including art style, a recurring protagonist, and a caption for every panel. The dataset was created by baulab to study spatial grounding in vision-language models, specifically tracking attention across multi-panel images.
Errata data and scripts correct a methodological error in the original ENERGETIC project report concerning power and energy consumption measurements of a U280 FPGA card. The dataset, authored by Michael Bane, was last updated on May 28, 2026. It is a small archive of 672.6 KB containing revised data and analysis scripts.
Three terabytes of high-resolution imagery and auxiliary data collected from Antarctic fast ice at Cape Evans during November and December 2018-2019. The dataset was acquired by the IMAS/AGP under-ice HI system and a custom ice core scanner for the 'On Thin Ice' grant, a collaboration between AGP and NZARI. It includes in-situ transects under natural light, ex-situ ice core scans, irradiance measurements, fluorometric samples, and media footage.
A bathymetry survey acquired by Deakin University over two days in October 2015 (14/10/2015-15/10/2015). The survey was conducted onboard the Motor Vessel Yolla using a Kongsberg EM2040c sonar system and is managed by the Australian Ocean Data Network.