Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,167 datasets
MAOAM (Mask Any Object And Material) is a unified selection framework for precise object and material-level selection across text- and click-based interactions. This repository contains a 10% subset of the material annotations from the associated paper, featuring per-region text descriptions and VQA questions across three sets: SynMat, RealMat, and SAMa. The dataset was authored by jpark677 and last updated on Hugging Face in June 2026.
Geoscience Australia collected marine geophysical data from the Kenn Plateau off northeast Australia. The survey gathered 3090 km of seismic data and 7584 km of bathymetric data, along with 12 dredge hauls and one grab sample. The data was collected during a research voyage on the RV Southern Surveyor, with an additional two days of ship time scheduled for November-December 2004.
10 hours of Japanese conversational speech recorded using mobile devices to mirror real-world usage. The dataset is designed in a conversation-based style to capture interactive communication for authentic model training. It was created by MagicDataTech and last updated on June 10, 2026.
Geoscience Australia Data provides a study of the complex seabed morphology and sediment distribution in Keppel Bay, a large shallow coastal embayment in Queensland. The data, last updated on 2026-04-30, reveals the former path of the Fitzroy River across the continental shelf and details Holocene sea-level changes. It describes sediment composition, including muddy sand infill in inner bay palaeochannels and relict fluvial deposits in the outer bay.
Records list unclaimed individuals cremated by the Cook County Medical Examinerβs Office. The dataset includes demographic and event date columns such as Name, Age, Sex, Race, Date of Death, and Cremation Date. It is published by datacatalog.cookcountyil.gov and was last updated in early April 2026.
Geoscience Australia Data provides a geological study of the continental shelf off southeast Australia between Sugarloaf Point and Gabo Island. The description details shelf width variations from 72 km to 17 km, three depth-based morphological zones, and the composition of surface sediments. The dataset was last updated on 2026-04-30.
Six thick sedimentary cycles from the Surat Basin document environmental changes during the Jurassic and Cretaceous periods. The cycles, each hundreds of metres thick, are interpreted as responses to global sea-level oscillations. This analysis is provided by Geoscience Australia Data.
Indirect leaf area index (LAI) estimates were obtained from the KSU Light Wand Study using a LI-COR LAI-2000 Plant Canopy Analyzer. The instrument measures canopy transmittance at five zenith angles to estimate LAI and mean leaf inclination angle. This dataset is hosted by ORNL_CLOUD and appears on multiple government data platforms.
20 multispectral surface reflectance images were collected by the EO-1 satellite Hyperion sensor at 30-meter resolution, covering the entire Amazon Basin from 2002 to 2005. The data was processed by ORNL_CLOUD using ENVI software and the ACORN atmospheric correction algorithm. Images are distributed in GeoTIFF format with companion ENVI header files.
Hindi Speech Instruct is a multi-turn Hindi conversational dataset for training speech language models, created by author somu9. It contains 10 conversations with a total of 25 user audio turns paired with 25 assistant text responses. The dataset was last updated on 2026-06-17.
MYD09Q1 Version 6.1 provides atmospherically corrected surface spectral reflectance estimates for Aqua MODIS Bands 1 and 2 at a 250-meter resolution, composited over an 8-day period. The pixel selection criteria for the composite include cloud conditions and solar zenith angle, defaulting to the pixel with the minimum blue channel value. This dataset includes two quality layers and incorporates calibration improvements such as polarization correction and updates to the response-versus-scan angle model.
Mingfeng Yan's dataset characterizes Pseudomonas syringae pv. actinidiae (Psa) biovar 3 isolates from kiwifruit in Jiangxi Province. It includes 42 bacterial isolates collected from six production areas, all identified as the hypervirulent biovar 3. The data reveals no copper-sensitive strains, with minimum inhibitory concentrations (MICs) for copper sulfate ranging from 1.80 to 2.60 mM.
Sudan's humanitarian needs data contains overall people in need and intersectoral severity by disaggregation level, which includes administrative divisions and population groups. The dataset is produced by the United Nations Office for the Coordination of Humanitarian Affairs (OCHA) in collaboration with humanitarian partners using the Joint Intersectoral Analysis Framework (JIAF). It was last updated on May 18, 2026.
A 2026 study by Aihua Zhang presents a framework for climate vulnerability assessment using Large Language Models and conformal prediction. The 18.6 KB document contains empirical validation results from Guangdong, Sichuan, and Yunnan provinces. It reports performance metrics like a calibration correlation of 0.816 and a root mean square error of 7.8.
Geological Survey of Victoria data contains primary geological boundaries and faults for Pre-Permian rock units. The dataset was compiled from surface geology maps and interpretation of magnetic, radiometric, gravity, and seismic data to produce a geologically and geophysically reasonable map. It should be used in combination with the state magnetic image for additional context on magnetic properties, dyke swarms, and basalt cover.
A dataset collected for a counseling psychology research project investigates the moderating role of psychological flexibility. It includes demographic variables and questionnaire responses from Iranian women and was analyzed using IBM SPSS Statistics. The data are provided for academic and research purposes.
Statistical results from linear mixed models analyzing the interaction effect between urban centers on morphological polycentricity. The dataset, authored by Juan Zhu and last updated on June 1, 2026, is a 5.5 KB Excel file containing fixed effect estimations and model fitting statistics.
ORNL_CLOUD provides coefficients for correcting atmospheric effects in satellite radiometric data from the FIFE project. These coefficients, generated using the Fraser and LOWTRAN 7 models, are inputs for algorithms that derive surface reflectance from raw satellite and aircraft measurements. The dataset is hosted on multiple platforms, with metadata indicating updates as recent as 2026.
A synthetic dataset of 10,000 Spanish-language IT support interactions designed for model fine-tuning. The dataset is created by author bronc2 and was last updated on the platform in June 2026. It includes a free 50-record sample, with the full dataset available for purchase.
20 women with PTSD symptoms from the war in Ukraine underwent a four-week intervention of transcutaneous auricular vagus nerve stimulation combined with slow breathing. Data includes self-reported PTSD, depression, anxiety, sleep, and somatic symptom scores, plus physiological measures like heart rate variability and respiratory rate, collected at five time points from one month before to two months after the intervention. The dataset, shared under a CC-BY-4.0 license by MikoΕaj Szulczewski, totals 1.3 GB.