Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
43,995 datasets
Yukon's 1995 workshop report summarizes prioritized geoscience research needs for the next 5-10 years. The document represents a consensus from four sponsoring agencies based on input from 34 geoscientists working in Yukon. It was created to guide geological mapping, geochemical surveys, and mineral deposit studies to support mining and land use planning.
156 hours of high-fidelity Urdu audio address a critical under-resourcing in speech technology. The corpus contains 71,792 diarized utterances across three specialized subsets: Standard Pakistani Urdu, Urdu-English Code-Switched, and Pakistani-Accented English. It was created by ASLP-lab and last updated in June 2026.
The Timor Sea continental shelf sediment map, published by the Bureau of Mineral Resources, depicts lithofacies variations across the shelf. The data is part of a systematic reconnaissance survey program, with results published in the BMR Bulletin series accompanied by 1:1,000,000 scale maps. Three map sheets were printed by early 1974, and interpretation requires reference to the accompanying Bulletin 83 (GeoCat #163).
NASA/NOAA's Suomi NPP VIIRS Land Surface Temperature and Emissivity (LST&E) Version 1 swath product (VNP21) provides daily, 6-minute retrievals at a 750-meter spatial resolution. The dataset uses a physics-based algorithm combining the ASTER TES technique with Water Vapor Scaling to simultaneously retrieve LST and emissivity for three thermal infrared bands. It was decommissioned on April 8, 2025, with users directed to the improved Version 2 products for continuity.
Roads and cycle paths in Breda where scattered, with the inner city only curatively scattered. The dataset is provided by the Ministerie van Binnenlandse Zaken en Koninkrijksrelaties under a CC0-1.0 license. It is available in KML, ESRI SHAPE, CSV, and JSON formats, with an irregular update frequency.
Sensors Breda is a geospatial dataset listing sensor installations within the city of Breda, Netherlands. The dataset is published by the Ministerie van Binnenlandse Zaken en Koninkrijksrelaties under a CC-PDM-1.0 license. The update frequency is irregular and the last update date is unknown.
Geoscience Australia Data provides a geological description of the Proterozoic Davenport province, situated between the Tennant Creek and Arunta Inlier regions. The data describes sedimentary, volcanic, and intrusive rock sequences, including the 1870 Ma-old Warramunga Group and the at least 10 km thick Hatches Creek Group. It includes details on rock types, stratigraphy, geochemistry, deformation history, and recorded mineral production of about 4500 t tungsten concentrates and 15 kg gold.
MAOAM (Mask Any Object And Material) is a unified selection framework for precise object and material-level selection across text- and click-based interactions. This repository contains a 10% subset of the material annotations from the associated paper, featuring per-region text descriptions and VQA questions across three sets: SynMat, RealMat, and SAMa. The dataset was authored by jpark677 and last updated on Hugging Face in June 2026.
Geoscience Australia collected marine geophysical data from the Kenn Plateau off northeast Australia. The survey gathered 3090 km of seismic data and 7584 km of bathymetric data, along with 12 dredge hauls and one grab sample. The data was collected during a research voyage on the RV Southern Surveyor, with an additional two days of ship time scheduled for November-December 2004.
10 hours of Japanese conversational speech recorded using mobile devices to mirror real-world usage. The dataset is designed in a conversation-based style to capture interactive communication for authentic model training. It was created by MagicDataTech and last updated on June 10, 2026.
Geoscience Australia Data provides a study of the complex seabed morphology and sediment distribution in Keppel Bay, a large shallow coastal embayment in Queensland. The data, last updated on 2026-04-30, reveals the former path of the Fitzroy River across the continental shelf and details Holocene sea-level changes. It describes sediment composition, including muddy sand infill in inner bay palaeochannels and relict fluvial deposits in the outer bay.
Records list unclaimed individuals cremated by the Cook County Medical Examinerβs Office. The dataset includes demographic and event date columns such as Name, Age, Sex, Race, Date of Death, and Cremation Date. It is published by datacatalog.cookcountyil.gov and was last updated in early April 2026.
Geoscience Australia Data provides a geological study of the continental shelf off southeast Australia between Sugarloaf Point and Gabo Island. The description details shelf width variations from 72 km to 17 km, three depth-based morphological zones, and the composition of surface sediments. The dataset was last updated on 2026-04-30.
Six thick sedimentary cycles from the Surat Basin document environmental changes during the Jurassic and Cretaceous periods. The cycles, each hundreds of metres thick, are interpreted as responses to global sea-level oscillations. This analysis is provided by Geoscience Australia Data.
Indirect leaf area index (LAI) estimates were obtained from the KSU Light Wand Study using a LI-COR LAI-2000 Plant Canopy Analyzer. The instrument measures canopy transmittance at five zenith angles to estimate LAI and mean leaf inclination angle. This dataset is hosted by ORNL_CLOUD and appears on multiple government data platforms.
20 multispectral surface reflectance images were collected by the EO-1 satellite Hyperion sensor at 30-meter resolution, covering the entire Amazon Basin from 2002 to 2005. The data was processed by ORNL_CLOUD using ENVI software and the ACORN atmospheric correction algorithm. Images are distributed in GeoTIFF format with companion ENVI header files.
Hindi Speech Instruct is a multi-turn Hindi conversational dataset for training speech language models, created by author somu9. It contains 10 conversations with a total of 25 user audio turns paired with 25 assistant text responses. The dataset was last updated on 2026-06-17.
MYD09Q1 Version 6.1 provides atmospherically corrected surface spectral reflectance estimates for Aqua MODIS Bands 1 and 2 at a 250-meter resolution, composited over an 8-day period. The pixel selection criteria for the composite include cloud conditions and solar zenith angle, defaulting to the pixel with the minimum blue channel value. This dataset includes two quality layers and incorporates calibration improvements such as polarization correction and updates to the response-versus-scan angle model.
Mingfeng Yan's dataset characterizes Pseudomonas syringae pv. actinidiae (Psa) biovar 3 isolates from kiwifruit in Jiangxi Province. It includes 42 bacterial isolates collected from six production areas, all identified as the hypervirulent biovar 3. The data reveals no copper-sensitive strains, with minimum inhibitory concentrations (MICs) for copper sulfate ranging from 1.80 to 2.60 mM.
Sudan's humanitarian needs data contains overall people in need and intersectoral severity by disaggregation level, which includes administrative divisions and population groups. The dataset is produced by the United Nations Office for the Coordination of Humanitarian Affairs (OCHA) in collaboration with humanitarian partners using the Joint Intersectoral Analysis Framework (JIAF). It was last updated on May 18, 2026.