Loading...
Loading...
Image classification, object detection, segmentation, face recognition, OCR, image generation, video understanding
15,261 datasets
Quebec's forest cover portrait integrates ecoforest maps from southern and northern inventories with administrative boundaries and non-forest land use data. The assembly enables calculation of forest land use rates for municipalities, indigenous territories, and administrative regions. It is updated annually with the latest source data and published each December.
Seven highway camera systems in Victoria have recorded fines for speeding and unregistered vehicles since their respective operational start dates. Fixed cameras on the Monash Freeway began issuing fines in January 2014, with data recorded at the offence date. The dataset, provided by Cameras Save Lives and last updated in April 2026, notes that road works, camera maintenance, and fine withdrawals can influence the recorded numbers.
3.3 GB of standardized and processed underwater imagery for object detection tasks. The dataset was last updated on May 16, 2026, by author 嘉宇 王 and is shared under a CC-BY-4.0 license.
LC-MS data identifies host proteins interacting with the pathogenic fungus Lomentospora prolificans. The 19.3 KB XLSX file, published by Povilas Kavaliauskas on figshare in April 2026, lists protein hits from pulldown experiments on airway epithelial cells. It provides evidence that integrin β4 serves as a receptor for fungal adhesion.
49,800 high-quality grayscale word glyph images (256x64 pixels) form this synthetic dataset. Curated by Khant Sint Heinn, it is officially published by DatarrX, a Myanmar Open Source Organization. The dataset is designed to map authentic word representations, distinct from a sibling project exploring theoretical syllable grids.
Trends in fines for seven highway camera systems in Victoria, Australia, recorded since each system began operation. The Peninsula Link system started in September 2013, with fines issued for speeding and unregistered vehicles. Data is provided by Cameras Save Lives and was last updated in April 2026.
An ethnographic and action research project investigates how residents in an Eindhoven district shape community life and how active residents and professionals support this process. Data was gathered through observations, conversations, and active participation in activities, including network meetings, and recorded in a logbook. The findings were analyzed and presented to residents, the involved housing corporation, the research group, and other professionals.
A multimodal dataset from DigitalUmuganda, last updated in 2026, where each data point consists of a JPEG image, a corresponding audio WAV file describing the image, and often a transcription of the audio. The description lists over 500 audio hours per language for six languages: Shona, Lingala, Fulani, Malagasy, Wolof, and Somali, with over 100 transcribed hours each.
39,000 synthetic instruction-following examples for generating simple code across 8 programming languages. The dataset is structured in the Alpaca format and is available in both Indonesian and English, created by Sandroeth and last updated on 2026-05-26.
Approximately 8,700 synthetic reasoning examples generated by Claude Opus models 4.6 and 4.7. The dataset was created by the author angin1920 and last updated on May 22, 2026. It contains Assistant responses with synthetic chain-of-thought reasoning mimicking expected 'thinking' patterns.
Sociological districts from the Government and Municipalities of Québec delineate 32 recognized Montreal neighborhoods based on history, belonging, and socio-community organization. The data, last updated in 2026, is available in multiple geospatial formats including SHP and GEOJSON. It represents a non-administrative territorial concept used for local consultation.
EAVSD is a large-scale dataset for subject-driven, narrative-planned multi-image generation, released as part of a CVPR 2026 paper. Each sample contains a reference product image and eight cinematic scene images depicting the same subject in diverse, narratively coherent settings. The dataset was created by author zjyao-PKU and was last updated on 2026-05-26.
Colombian health education data from the Ministerio de Salud y Protección Social, published on the Socrata platform. The dataset likely contains a hierarchical classification system for educational content, as suggested by columns for categories and subcategories. It was last updated on 2026-05-18.
May 9 to July 16, 2017 data from the Convective Processes Experiment (CPEX) field campaign. The dataset contains Moderate Resolution Imaging Spectroradiometer (MODIS) measurements from sixteen DC-8 missions over the North Atlantic-Gulf of America-Caribbean Sea region. It was produced by the GHRC DAAC to study storm initiation, organization, growth, and dissipation.
VideoTemp-o3 is a dataset for harmonizing temporal grounding and video understanding in agentic thinking-with-videos. The dataset contains question and answer pairs sourced for training the VideoTemp-o3 model, which performs on-demand temporal grounding to locate relevant video segments. It was created by Kwai-Keye and last updated on May 20, 2026.
Light100K is a dataset of low-light images paired with five enhancement targets. The dataset is organized as Parquet shards with one aligned sample per row, containing the low-light input and five target images as Hugging Face image columns. It was created by ControlLight and last updated on May 26, 2026.
FIB-SEM datasets of the Anaeramoeba flamelloides BUSSELTON2 symbiosome contain two 3D cellular volumes acquired via focused ion beam scanning electron microscopy. The volumes, with resolutions of 8.43 nm x 8.43 nm x 8.0 nm (1,725 slices) and 6.744 nm x 6.744 nm x 7.0 nm (1,300 slices), were segmented using a combination of manual annotation and deep learning. The dataset was created by Jon Jerlström-Hultqvist and is available on figshare under a CC-BY-4.0 license.
5.5 KB of tabular data detailing the number of images allocated for model training, internal validation, and external validation across three otoscopic image datasets. The dataset was authored by Yixi Xu and last updated on May 7, 2026. It is provided under a CC-BY-4.0 license.
A study from the Australian Ocean Data Network examines benthic nutrient and gas fluxes, water column, and sediment properties in St. Georges Basin, a coastal lagoon in southeastern Australia. The research investigates how diatoms control nutrient and carbon cycles, particularly the fractionation of nutrients through benthic processes. The dataset was last updated on 2026-04-10.
A dataset containing Relative Inflammatory Severity Index (RISI) scores calculated for six major organs explanted from mice infected with Prototheca species. The data is provided by Angelika Proksurnicka and was last updated on May 11, 2026. The dataset is small, with a file size of 9.5 KB.