Loading...
Loading...
Image classification, object detection, segmentation, face recognition, OCR, image generation, video understanding
15,629 datasets
A subset of the TextOCR split from Yesiarohn/OCR-Data, created by theminji on 2026-05-15. This dataset is intended for training and testing optical character recognition models, specifically designed to be a smaller, more focused collection for experimentation.
Two major seabed swath-mapping surveys, AUSTREA-1 and AUSTREA-2, were completed in early 2000. The Australian Geological Survey Organisation conducted these surveys to provide scientific information for implementing Australia's Ocean Policy and establishing marine protected areas. Data covers the South-east Marine Region including Lord Howe Island, the South-east Australian Margin, Tasmania, the South Tasman Rise, and the Central Great Australian Bight.
Public Services and Procurement Canada publishes financial information in the Public Accounts at the end of each fiscal year. This dataset details the contingent liabilities for international organizations as reported in those accounts. The data was last updated on April 9, 2026.
Tusgan 9Channel is a dataset uploaded to Hugging Face by the author nallapuvenkat. The title suggests it likely contains multi-channel image data, potentially for computer vision tasks. The dataset was last updated on June 15, 2026.
A structured dataset of WiFi Channel State Information (CSI) captures recorded as text logs. The data includes repeated trials for 4 class labels and 4 subjects, recorded simultaneously by 3 receiver devices. The dataset was authored by ilyakolosov and last updated on 2026-05 01:34:15.
A dataset and code repository for reproducing the analysis from the paper "Prioritizing native species in urban restoration: Win-win or trade-off with use value?". The materials were authored by Luc Schmid and last updated on April 22, 2026. It includes code, data, and documentation to replicate the study's figures and results.
Nivel del Mar provides raw, unvalidated sea level observations from automated sensor stations across Colombia. The dataset is published by the Instituto de Hidrología, Meteorología y Estudios Ambientales (IDEAM) as open data under Law 1712 of 2014. Data was last updated in March 2026.
VisCoR-55K is a high-quality dataset for visual reasoning spanning five categories: General, Reasoning, Math, Graph/Chart, and OCR. It contains original visual question-answer pairs, matched contrastive VQA pairs, and high-quality rationales synthesized by the VC-STaR framework. The dataset was authored by 5551z and last updated on Hugging Face in April 2026.
Point cloud data representing electronic components and their spatial relationships on printed circuit boards. The dataset was created by author Zedong Huang and published on April 4, 2026. It is a small dataset with a file size of 146.0 KB.
This geospatial dataset documents terrestrial and marine protected areas and Other Effective Area-based Conservation Measures (OECMs) within the Democratic Republic of the Congo. Managed by the UNEP World Conservation Monitoring Centre (UNEP-WCMC) in collaboration with the IUCN, the data is updated monthly to support international biodiversity reporting and policy decisions.
Uganda-specific geospatial records of terrestrial and marine protected areas and other effective area-based conservation measures (OECMs). Managed by the UN Environment Programme World Conservation Monitoring Centre (UNEP-WCMC) and IUCN, this data is updated on a monthly basis. It serves as the primary source for tracking Uganda's progress toward the Kunming-Montreal Global Biodiversity Framework Target 3.
Live birth counts from New York State stratified by the month prenatal care began and the mother's county of residence. The dataset is provided by health.data.ny.gov and covers data beginning in 2008, with the latest update recorded in March 2026. Data may differ from other Vital Statistics publications due to update schedules.
Geoscience Australia houses one of the world's largest collections of petroleum data, comprising both digital and historical hard-copy records. The collection includes well completion reports, logs, analysis reports, seismic profiles, and core photography submitted by industry under legislative requirements or gathered by government research projects. This data is available through the National Offshore Petroleum Information Management System (NOPIMS).
Bathythermograph data from 1989 captures ocean temperature and depth profiles from ships of opportunity. The dataset, submitted on a physical cassette by a National Marine Fisheries Service researcher, has been converted to the NODC C116 digital format. It provides a snapshot of marine conditions in the North Atlantic Ocean over a single year.
5.5 KB of data describes the distribution of a rice leaf disease image dataset used to train an improved Faster-RCNN model. The dataset was uploaded by author Xiaofan Shi in March 2026. It supports research into detecting fine features of rice diseases.
Replica_OCC is a benchmark dataset for evaluating embodied occupancy prediction systems, constructed in the style of EmbodiedOcc-ScanNet and OccScanNet. It provides RGB-D sequences and scene-level occupancy ground truth, released by author 'the-masses' and last updated on May 6, -2026. The ground-truth occupancy and poses are intended for evaluation-time alignment and metric computation.
Water temperature, salinity, oxygen, dissolved inorganic carbon, pH, and methane data were collected from 293 discrete water samples during a 2017 research cruise. Measurements were taken using CTD casts and Niskin bottles along five transects in the Mid-Atlantic Bight, from near-surface to near-seafloor depths. The dataset was produced by the National Oceanic and Atmospheric Administration.
World Health Organisation data concerning the COVID-19 pandemic. The dataset is provided by the Dutch Ministry of the Interior and Kingdom Relations via the EU Open Data portal under a CC0-1.0 license. The specific temporal and geographic scope, as well as the exact data content, are not detailed in the available metadata.
A tabular dataset comparing the performance of different object detection algorithms. The dataset was authored by Zhaopeng Yuan and last updated on April 20, 2026. It is a small Excel file of 5.5 KB.
OCHA Digital Services maintains this controlled vocabulary of humanitarian organization types, updated as of March 2026. The data provides standardized categories and definitions sourced from ReliefWeb and the Grand Bargain framework. It is distributed in CSV and Google Sheet formats to support humanitarian data interoperability.