Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,481 datasets
Camp dels Ninots provides the source for this dataset of thoracic vertebrae measurements in millimeters for the extinct mammal Parabos tigneresi. Leonardo Sorbelli created the dataset, which includes estimated values and was last updated on June 3, 2026. The data is stored in a 12.7 KB XLSX file under a CC-BY-4.0 license.
11.2 KB of measurements in millimeters for cervical vertebrae III to VII of the fossil species Parabos tigneresi. The dataset, created by Leonardo Sorbelli, includes estimated measurements and is available under a CC-BY-4.0 license. It was last updated on June 3, 2026.
10.7 KB of fossil axis measurements for the species Parabos tigneresi, excavated from the Camp dels Ninots site. The dataset was authored by Leonardo Sorbelli and last updated on 2026-06-03. Estimated measurements are distinguished within the data.
10.7 KB of measurements in millimeters for the atlas bone of the extinct bovid Parabos tigneresi, excavated from the Camp dels Ninots site. The dataset, authored by Leonardo Sorbelli, includes estimated values marked in italics and was last updated on June 3, 2026. It is shared under a CC-BY-4.0 license as an XLSX file.
A complete list of fossil specimens from the extinct bovid species Parabos tigneresi, collected from the Camp dels Ninots paleontological site. The dataset was created by Leonardo Sorbelli and is available under a CC-BY-4.0 license. It was last updated on June 3, 2026.
A seamless topographic color map service covering Australia, its outer islands, and external territories. The map integrates data from Geoscience Australia, the Australian Antarctic Division, and OpenStreetMap, portraying cultural, hydrography, marine, transport, vegetation, and relief themes. The topographic information was checked in 2008 and supplemented in 2009, with contributions from multiple government departments.
A metadata registry from Colombia's National Penitentiary and Prison Institute (INPEC) detailing its published information assets. The dataset includes columns for information title, responsible parties, generation date, update frequency, format, and access location. It is published on datos.gov.co to comply with Colombia's Law 1712 on transparency and public information access.
NASA's AVIRIS-Classic instrument captured 224 spectral bands of radiance data over Canadian boreal forests on August 14, 1996. The dataset consists of 66 calibrated image scenes across seven flight lines, each with 20-meter spatial resolution and an 11 km swath width. This Level 1B data was collected for the Boreal Ecosystem-Atmosphere Study to understand energy and gas exchanges between the forest and atmosphere.
A 1.9 MB PDF document authored by Jose Martinez describes the Hickory Municipal Classification System (HMCS). This population-based framework is designed to classify municipalities consistently across all U.S. states. The document was last updated on 2026-05-17.
Approximately 45,000 commercial underground storage tanks previously and currently registered in Connecticut, with about 8,000 still in use. The list is based on notification information submitted since November 1985 and is updated weekly by the Connecticut Department of Energy and Environmental Protection (CT DEEP). It contains information on both active and non-active tanks, including federally and state-regulated USTs.
Colombia's Chocó Chamber of Commerce maintains a register of public information it generates or controls that is classified or reserved. The dataset includes metadata such as the information's status, classification date, category, and legal justification. It is published on the Colombian open data portal, datos.gov.co, and was last updated on 2026-05-18.
Malayalam Instruct Dataset-L is a large-scale instruction-tuning dataset for the Malayalam language. It was programmatically compiled from over 20 multilingual text corpora, translation engines, and RSS feeds, heavily featuring the CulturaX database. The dataset was created by author siyah1 and was last updated on June 17, 2026.
Oklahoma provides the location for this dataset of vertically pointing Doppler radar measurements from the Midlatitude Continental Convective Clouds Experiment (MC3E) in Spring 2011. The NASA GPM Ground Validation team collected data on vertical velocity, drop size distribution, rainfall rate, and other atmospheric variables using a second-generation METEK Micro Rain Radar operating at 24.24GHz. This dataset supports validation of satellite precipitation observations.
Leicester City Council provides general documentation for Conservation Areas as part of the Open Digital Planning initiative. The dataset is available in multiple formats including CSV, JSON, and Parquet. It was last updated on June 17, 2026.
Open Digital Planning provides general documentation for listed building outlines in the UK. The dataset is published by Leicester City Council and was last updated on June 17, 2026. It is available in multiple formats including CSV, JSON, Parquet, and RDF.
An inventory of public information generated, obtained, acquired, or controlled by the obligated entity that has been classified as confidential or reserved. The index includes columns for document series, classification date, responsible parties, and content description. It is published by www.datos.gov.co and was last updated on 2026-05-18.
59.0 MB of curated source datasets, processed inputs, and supporting materials for the AbTune study. The repository contains files in MD, PY, TXT, FASTA, CSV, and PT formats. It was authored by Xiaotong Xu and last updated on 2026-05-21.
MedDocBench Sample is a public 10-document release for benchmarking vision-language models on structured extraction from real-world clinical forms. The sample is provided by Owenhku for repository smoke tests, format inspection, and reproducibility checks. The full secondary-redacted dataset is released as a separate gated Hugging Face dataset.
254.8 MB of geospatial data supporting research on plantation expansion impacts. The dataset includes TIF/TIFF raster files and R code, created by YUE ZHAO to generate figures for a study on forest water and carbon patterns. It was last updated on 2026-05-25.
A continental-scale dataset of pixel-based surface reflectance composites for Australia's dynamic coastal and estuarine environments. The composites are generated using a multi-resolution tidal model based on a Voronoi mesh to account for tidal influences, enabling robust analysis of coastal changes. This dataset is associated with a 2018 research paper published in the journal Remote Sensing.