Loading...
Loading...
Image classification, object detection, segmentation, face recognition, OCR, image generation, video understanding
15,926 datasets
80,000 oceanographic stations in the Atlantic from 1900-1991 provide vertical profiles of temperature and salinity. Data includes 65,000 Black Sea stations with hydrochemical and meteorological observations from 1910-1992, plus surface station data from coastal Guinea. The dataset was compiled by the Ukrainian Academy of Science's MHI and other sources, with the latest records from 1992.
A time series of global 10-day normalized difference vegetation index composites derived from daily AVHRR observations. The data is a component of NASA's AVHRR Pathfinder Program, produced through a collaboration involving NOAA, NASA, USGS, ESA, CSIRO, and 30 international ground stations. The time series begins in April 1992 and continues for specific time periods.
AU_AADC collected thalli of lichens Buellia frigida and Xanthoria elegans from five locations in the Vestfold Hills and Mawson Station in eastern Antarctica. DNA was extracted and the ribosomal ITS region was sequenced to assess genetic variation within populations. The data provides a 1999 baseline for monitoring fungal colonization and genetic resources under climate change pressures.
CEOS_EXTRA provides a dataset containing dissolved organic carbon (DOC), specific UV, and trace element concentrations for water samples from the Everglades. The collection includes 27 samples gathered from 10 distinct field sites. Data was last updated in March 1995.
Water chemistry data includes lab pH, alkalinity, and concentrations of ions like Cl, SO4, Ca, Mg, Na, and K. The collection contains 27 samples gathered from 10 sites in the Everglades. Data was collected by a multi-agency project involving the South Florida Water Management District, U.S. EPA, and USGS South Florida Ecosystems Initiative, with the dataset last updated in March 1995.
CONMAPSG contains grain-size distribution data for sediments off the eastern United States continental margin. The dataset was compiled by the U.S. Geological Survey and Woods Hole Oceanographic Institution from thousands of samples collected starting in 1962. The data was last updated in 1999.
A dataset captured at 30 fps using Aria glasses, providing high-resolution 1408 x 1408 raw fisheye RGB images. It was created by author taegyoun88 for robust 6D object pose estimation in egocentric views under extreme environmental conditions. The dataset features 15 participants performing diverse interactions with 13 different objects.
UniICL-760K is a large-scale dataset containing 766,868 episodes designed for unified multimodal in-context learning. It was created by xuyicheng-zju and focuses on visual understanding and generation tasks organized within a six-level capability taxonomy. The dataset was last updated on Hugging Face in April 2026.
An instructional dataset for training the AI of the Albanian Armed Forces General Staff. It contains question-answer pairs in the Albanian language covering topics such as military organizational structure, the General Staff, land/air/sea forces, NATO integration, defense legislation, Albanian military doctrine, and military history. The dataset was authored by franceskoshahinasilogicleaders and last updated on Hugging Face in April 2026.
Expenditure records from candidate committees, political action committees, party committees, and ballot issue committees in Iowa. Data is available beginning in 2003 from reports filed electronically and some paper reports. The State of Iowa provides this data, which was last updated in March 2026.
UNESCO-sourced education, demographic, and socio-economic indicators for Uganda, updated as of March 2026. The data covers Sustainable Development Goal 4 (SDG 4) metrics alongside other policy-relevant and socio-economic statistics.
Chinese Modern Era (1840β1949) Handwritten Historical Archive Dataset. It was created by a joint student research team from Capital Normal University to address recognition difficulties for Optical Character Recognition (OCR) models. The dataset page was last updated on 2026-04-08.
Exdark YoloV11 Format is a dataset published on Kaggle. The title suggests it contains images formatted for training the YOLOv11 object detection model, likely focusing on low-light or dark environments. No further metadata is available to confirm the dataset's size, source, or specific contents.
Doc OCRBench v2 is a dataset published on Kaggle. Its title suggests it is a benchmark for evaluating optical character recognition systems on documents. The dataset's specific contents, size, and creation details are not provided in the available metadata.
HiSync is a multimodal dataset organized by collection batch ID, containing camera views and IMU sensor data. The data is published by author Octopus1 on the Hugging Face platform. Its last recorded update was on April 4, 2026.
UNESCO provided these education, demographic, and socio-economic indicators for Lao People's Democratic Republic, with the latest update recorded in March 2026. The collection includes SDG 4 Global and Thematic metrics alongside policy-relevant data points for national development tracking.
Replication materials for a political science article titled 'Priming Common European and Democratic Values Does Not Reduce Affective Polarization.' The dataset includes all data, code, and supplementary files needed to reproduce the reported analyses, tables, and figures. The materials were authored by Γlvaro Canalejo-Molero and are hosted on Harvard Dataverse.
UNESCO-sourced education, demographic, and socio-economic indicators for the Democratic Republic of the Congo, updated as of March 2026. The data includes Sustainable Development Goal 4 (SDG 4) metrics and other policy-relevant statistics from the UIS bulk data service.
An October 2025 snapshot of English Wikipedia yielded approximately 319,000 mathematical formulas. This curated collection provides LaTeX source code paired with high-quality rendered images, created by author piushorn and last updated on Hugging Face in March 2026. Formulas were filtered by visual complexity and renderability.
LVOmniBench is an evaluation benchmark for omnimodal large language models focused on long-form audio-video understanding. It was created by KD-TAO and launched in March 2026.