Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,434 datasets
Geochemical data from the Stavely Project in western Victoria includes whole rock geochemistry, four acid digestion analysis, and partial extraction techniques like soil gas hydrocarbon and Mobile Metal Ion. The release also contains sulphur, neodymium, and lead isotope analysis, chromite and pyrite analysis, and associated interpretation reports. It was published by Geoscience Australia Data and last updated on 2026-05-14.
The 2020 fiscal year audit plan for the District of Barranquilla, Colombia, detailing scheduled audits for district-level entities. The dataset is published on the datos.gov.co platform and was last updated on 2026-05-18. It likely contains the planned audits, their status, and the entities targeted.
Heavy-mineral deposits along the coasts of Victoria, Tasmania and South Australia. The dataset is published by the Australian Ocean Data Network on data_gov_au. The record was last updated on 2026-06-16 18:26:25.200518.
Loom_01 is a curated dataset of 50,000 interleaved tutorials for training and evaluating diffusion-transformer models. It was created by researchers from Beijing Institute of Technology, Alibaba Group, and the National University of Singapore and released on Hugging Face in 2026.
20,000,000 rows of high-fidelity synthetic threat intelligence telemetry generated by Zia Data Labs. This production-grade dataset is designed to simulate continuous threat intelligence ingestion at enterprise scale for building detection models. The dataset was last updated on June 10, 2026.
The Port Curtis Integrated Monitoring Program (PCIMP) collected this sediment data set via deployed sensors in Zone 05, the Inner Harbour. The Australian Ocean Data Network hosts the data, which covers a period from December 2006 to May 2025. The data set is available in HTML format.
21,627 English search queries categorized into 19 distinct classes. The dataset was created by machine translation of Chinese search queries and is authored by Pankaj8922.
Geoscience Australia Data produced the Australia’s Future Energy Resources (AFER) Project dataset, which provides a regional geological interpretation of the Pedirka and western Eromanga basins. The dataset, last updated in April 2026, resulted from the Exploring For The Future program and integrates biostratigraphic and reprocessed seismic data. It investigates energy resource potential and carbon storage suitability, revising stratigraphic basin boundaries and depositional history from the Paleozoic to the Late Cretaceous.
MEN_MATRICULACION_POR_GENERO_EDAD contains data on gender parity indices (IPG=F/M) and enrollment indicators for Pre-school, Basic, and Media Education, disaggregated by gender, age, and Certified Territorial Entity (ETC) for the year 2020. The dataset is hosted on the Socrata platform via the Colombian open data portal www.datos.gov.co. It includes columns for enrollment counts by gender and age group, gender parity indices for specific age ranges, and territorial codes.
Rijksmuseum Amsterdam provides structured metadata and digital images for over 100,000 objects from its collection via an API. The data is updated daily from the museum's collection registration system and is available under a Creative Commons BY attribution license. The Ministerie van Binnenlandse Zaken en Koninkrijksrelaties is listed as the organization.
Approximately 50 parallel sentence and phrase pairs in the Even and Russian languages. The dataset is intended for research in machine translation, natural language processing, and preservation of the endangered Even language. It was created by DaniilMako and last updated on June 16, 2026.
A seamless topographic color map service covering all of Australia, its outer islands, and external territories. The data is sourced from Geoscience Australia, the Australian Antarctic Division, OpenStreetMap, and the Australian Collaborative Land Use and Management Program. Topographic information was checked in 2008 and supplemented in 2009, with limited field checking.
Colombian data on gender parity indices and enrollment counts for ethnic groups in preschool, basic, and secondary education for the year 2020. The dataset is disaggregated by Certified Territorial Entity (ETC) and ethnicity, with columns for specific ethnic groups like Negritudes, Palenquero, Rom, Raizales, and Indígenas. It originates from the Colombian open data portal www.datos.gov.co and was last updated in May 2026.
Tucker Allen published an XYZ file on figshare in May 2026. The data likely contains results from an ab initio study of the Photosystem II reaction center, comparing optical excitations of an isolated chromophore hexamer and a protein-dye cluster. The dataset is small, at 63.7 KB.
21.6 KB of XYZ coordinate data for a protein-dye cluster containing 3238 valence electrons, generated by Tucker Allen and published on figshare in May 2026. The dataset enables a quantum-mechanical study of how the protein environment renormalizes excitons in the Photosystem II reaction center. It compares the low-lying optical excitations of an isolated chromophore hexamer and the full protein-dye cluster.
A 2018 study by Sagar et al. demonstrates a method for generating continental-scale pixel-based surface reflectance composites for dynamic coastal environments. The approach uses a multi-resolution tidal model and a Voronoi mesh to account for tidal influences, producing mosaics of the Australian coastline. The composites are designed for further interpretation and analysis of coastal change.
Seamless topographic color mapping for the entirety of Australia, including its external territories and Antarctic claim. The service integrates data from Geoscience Australia, the Australian Antarctic Division, and OpenStreetMap, with vegetation data from the Australian Collaborative Land Use and Management Program. Limited field checking was undertaken, and the topographic information was last checked using satellite imagery in 2008.
Uzbek lexical entries organized in a WordNet-style structure. The dataset contains 11,178 rows and 14 columns, including synonyms, antonyms, definitions, and example sentences. It was created by uznlp-uz and last updated on 2026-06-17.
Preliminary results from the RV Polarstern Expedition PS141, which collected over 7500 nautical miles of hydroacoustic data from the largely unexplored Davis and Mawson Sea continental shelves. The data reveal sediment transport features and glacial landscapes, including iceberg scours and grounding zone wedges. The expedition was conducted from February to April 2024 as part of the EASI-3 project to investigate East Antarctic Ice Sheet interactions.
The Mobility Secretariat of the Municipality of Fusagasugá provides general information on the motor vehicle fleet registered in the National Single Traffic Registry (RUNT). The dataset includes vehicle registration details such as model, type, fuel, and service. The data was last updated on the platform on May 18, 2026.