Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,600 datasets
AHO Reference Surfaces Adelaide 1m 2023 is a bathymetry survey acquired for the Australian Hydrographic Office from September 4 to September 16, 2020. The surface was created from a contracted national reference survey in Gulf St Vincent, South Australia, for calibrating multibeam echosounders. The dataset consists of separate 1-meter resolution grids for two surveyed sites, exported as 32-bit floating point GeoTIFF files.
Mineral resource data concerning dry plant tailings from beach sand on Australia's east coast. The dataset is published by the Australian Ocean Data Network on data_gov_au and was last updated on 2026-06-16. The description metadata is minimal, with file formats listed as HTML and PDF.
Legacy product from the Australian Ocean Data Network with no abstract available. The dataset is published on data_gov_au and was last updated on 2026-06-16. Its specific content and scale are not described in the provided metadata.
somu9's Hindi TTS dataset contains 33,394 audio segments totaling 62.25 hours of speech. The data was collected from 149 YouTube videos using auto-generated closed captions for transcription. The audio is stored as 24kHz mono WAV files in sharded parquet format.
A 575.5 MB multimodal dataset from figshare, authored by ๆบ ๆด and last updated in May 2026. It contains results from a study investigating the pharmacological basis of Yi Qi Huo Xue Decoction for lumbar disc degeneration using network pharmacology, molecular docking, simulation, and rat model validation.
Monthly data from January 2025 to December 2025 characterizing affiliates of CREMIL. The dataset includes columns for country, number of affiliates, month, gender, person type, city, military rank and force, year, and department. It is hosted on the Socrata platform via the Colombian open data portal www.datos.gov.co.
Hsc Biology Bangla Dataset is a collection of 10,000 instruction-response pairs generated from the Higher Secondary Certificate Biology curriculum. The dataset focuses on Plant Physiology and was created by author 3amthoughts, with a last recorded update in 2026.
415,090 line-kilometres of raw-edited radiometric data were acquired over Western Australia in 2024. The data includes 256-channel gamma-ray spectra, raw window counts, and GNSS heights, captured at 100m line spacing and 50m terrain clearance. This point-located data represents the rawest form of gamma-ray spectrometric measurements for potassium, uranium, and thorium decay.
Geoscience Australia Data published a project investigating the fundamental knowledge gaps in geological CO2 storage. The project aims to make CO2 storage more predictable and safer by studying mineral trapping, a permanent storage mechanism. It employs a range of approaches including desktop studies, laboratory and field experiments, and geochemical modeling.
Queensland's Julia Creek Sheet area geological data mapped by the Bureau of Mineral Resources in 1961. The dataset includes explanatory notes and a map covering the western and northern margins of the Eromanga Sub-Basin. It describes Cretaceous rocks overlying crystalline basement, with Precambrian granite outcrops and Cainozoic deposits.
25,000 square kilometers of the Upper Devonian to Lower Carboniferous Drummond Basin sequence are mapped in east-central Queensland. The dataset, from Geoscience Australia, describes a structural remnant of a large intermontane basin that received up to 12,000 meters of predominantly fluviatile sediments. Sedimentation ceased during the Kanimblan orogenic event, which folded and uplifted the sequence.
A final project report from Geoscience Australia investigating the fundamental characteristics of mineralized fault systems. The report aims to understand why some fault systems are mineralized and others barren, and to rank critical parameters for identifying conduits for ore-forming fluids. It was last updated on 2026-05-14.
Geoscience Australia defines a borehole as any narrow shaft drilled in the ground, including Mineral Drillholes, Petroleum Wells, and Water Bores. This dataset is restricted to onshore and offshore Australian boreholes that support geological investigations and resource assessments. It is served via WMS and WFS protocols using the GeoSciML Borehole 3.0 standard.
A collection of synthetic datasets designed for pretraining the NVIDIA Nemotron 3 family of large language models. The dataset is aimed at improving model capabilities on specific tasks, including factual recall, moral scenarios, and diverse generative and multiple choice questions. It was created by NVIDIA and last updated on the platform on June 4, 2026.
Global land surface phenology metrics are derived from VIIRS satellite data at a 500-meter spatial resolution. The VNP22Q2 product provides yearly intervals of vegetation transition dates, including the onset of greenness increase and decrease, growing season length, and confidence layers. Each product contains 19 Science Dataset layers, capturing up to two growing cycles per year.
VNP21A2 is an 8-day composite product providing land surface temperature and emissivity data at a 1-kilometer spatial resolution. It contains 11 science datasets within a single HDF file, including daytime and nighttime LST, quality control, view zenith angle, time of observation, and emissivity for three spectral bands. The product is algorithmically compatible with MODIS data to ensure continuity in Earth observation.
415,090 line-kilometres of Total Magnetic Intensity data were acquired over the Narryer region in 2024 by the Western Australian Government. This raw-edited point-located dataset includes measurements of raw and compensated TMI, diurnal variations, fluxgate magnetometer readings, and altimeter heights. The data is intended for geological mapping, mineral exploration, and environmental studies.
Nรบmero de Casos de Niรฑos, Niรฑas y Adolescentes (NNA) reportados en SIRITI por cada Tipo de Vulnerabilidad a Nivel Nacional contains statistics on the total number of children and adolescents registered in the SIRITI system at the national level, classified by type of vulnerability. The data is hosted by www.datos.gov.co and was last updated on 2026-05-18. Columns suggest a breakdown of cases by categories such as 'Ninguna vulnerabilidad (OK)', 'Oficios del Hogar (OH)', and risks related to child labor.
5.5 KB Excel file containing model performance metrics for medical imaging tasks. The dataset, authored by Katharina V. Hoebel and last updated in April 2026, compares conventional and Monte Carlo dropout models using metrics like Spearman's rank correlation, AUROC, and MSE. Performance is evaluated on conditions including retinopathy of prematurity, knee osteoarthritis, and breast density classification.
Supplementary data from a study on the perception and acceptance of micronutrient-fortified bouillon cubes among household members. The ZIP file contains tables for confirmatory factor analysis, sensitivity analyses, intercoder reliability, and thematic analysis outputs from 24 focus group discussions. Author Felix Kwaku Kyereh published the dataset on figshare in April 2026 under a CC-BY-4.0 license.