DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Computer Vision Datasets | DataSalon

All Categories

👁️

Computer Vision

Image classification, object detection, segmentation, face recognition, OCR, image generation, video understanding

15,629 datasets

Computer Vision

OCHA Organization Types: Controlled Vocabulary with ReliefWeb Definitions

OCHA Digital Services maintains this controlled vocabulary of humanitarian organization types, updated as of March 2026. The data provides standardized categories and definitions sourced from ReliefWeb and the Grand Bargain framework. It is distributed in CSV and Google Sheet formats to support humanitarian data interoperability.

0 views

Computer Vision

SocratDataset: 6,803 Chinese Elementary Science Tutoring Dialogues

6,803 multi-turn Socratic dialogues covering elementary science topics for grades 1–6. This dataset was used to train SocratTeachLLM and published in the KELE paper (EMNLP 2025 Findings). An English translation is available as ulises-c/SocratDataset-EN.

TextScience TutoringQuestion AnsweringEducationChinese LanguagePhilosophyNatural Language ProcessingSocratic Dialogue+1

0 views

Computer Vision

AbAgym: Deep Mutational Scanning Measurements for Antibody-Antigen Complexes

68 deep mutational scanning datasets on antibody-antigen complexes contain approximately 324,000 non-redundant mutations and 36,541 non-redundant interface mutations. The dataset was curated by RosettaCommons and reorganized into Apache Parquet files for Hugging Face. The dataset page was last updated on 2026-05-04.

TabularMutational ScanningBioinformaticsAntibody AntigenProtein Interaction+1

0 views

Computer Vision

Australian Continental Margin Seabed Biogeochemical Survey Data

Over 350 seabed sediment samples were collected from Australia's western, northern, and eastern continental margins during federal government surveys from 2007 to 2014. The dataset includes parameters for organic matter source, concentration, and bioavailability, linking sediment properties to water column productivity.

Earth sciencesOrganic CarbonTotol NitrogenTrichodesmiumDiazotrophMarine Geochemistry Seabed SedimentsPublished ExternalParticulate+1

0 views

Computer Vision

MOBO-DIC_MPIM: Monthly Climatology of Oceanic Dissolved Inorganic Carbon (2004-2017)

Global ocean data from the Max Planck Institute for Meteorology provides mapped, gap-filled fields of dissolved inorganic carbon (DIC) in the water column. The dataset is a monthly climatology based on observations from 2004 through 2017, produced using a self-organizing map and feed-forward network (SOM-FFN) method extended to four dimensions. An ensemble mean from ten bootstrapping runs provides the final DIC field, with an ensemble spread representing methodological uncertainty.

Time SeriesGeospatialClimatologyOceanographySouthern OceanIndian OceanData Synthesis ProductAtlantic OceanPacific OceanLatDic ErrCarbon cycleOcean Carbon And Acidification Data System Ocads PMonthDicArctic OceanWater ColumnLonDepth+1

0 views

Computer Vision

NYC Sidewalk Violations Database with Defect and Location Details

Sidewalk Management System tracks inspections and violations for New York City sidewalks. The dataset includes columns for specific defect types like TRIP_HAZ and BROKEN, location identifiers like BBLID and ONSTNAME, and violation process dates like POST_DATE and VIssueDate. It is hosted by data.cityofnewyork.us and was last updated on 2026-04-03.

TabularCSVXMLJSONTreeSidewalk ManagementViolationsSidewalk Management DatabaseNew York CityRepinspectionInspectionSidewalkUrban Infrastructure+1

0 views

Computer Vision

Iron Staining Risk Classification For Perth Groundwater

The dataset classifies groundwater iron staining risk as 'High risk' or 'Low risk' for the Perth region. It was developed by the Department of Water and Environmental Regulation for the 'Perth Groundwater Atlas (2nd Edition), 2004'. The data is derived from monitoring bores and delineates areas with elevated iron or manganese staining potential.

SalinityBOUNDARIES ManagementHydrologyECOLOGY HabitatGroundwaterWetlandsWaterDWERClimate and WeatherWater QualityLand UseGEOSCIENCES HydrogeologyHydrochemistry+1

0 views

Computer Vision

Claude Opus 4.6-4.7 Reasoning: 8.7K Synthetic Examples

8,706 synthetic reasoning examples generated by the Claude Opus model during its development from version 4.6 to 4.7. The dataset was created by user 'angrygiraffe' and is hosted on Hugging Face. It was last updated on May 1, 2026.

TextLlm ReasoningClaude OpusSynthetic Data+1

0 views

Computer Vision

Runtime Comparison of DPCNet and YOLO11n for UAV Object Detection

A 5.5 KB XLS file contains runtime comparison data for the DPCNet and YOLO11n object detection models. The dataset, authored by Linfeng Jia and updated in March 2026, reports DPCNet's performance gains, including a 45% reduction in parameter count and [email protected] improvements of 2.0% and 5.1% on benchmark datasets.

Box Regression AdoptsScale Consistency45 IndicatingStage DetectorSmall Object DetectionTiny Target ScalesYolo11n Baseline1 RespectivelyDpcnet Improves MapSpatial GuidanceRobust SolutionSemantic StreamShallow Feature InteractionDecoupled Detection HeadSensitive ShapePath InteractionsPath Cross PerceptionBlur Fine DetailsDestabilize Multiscale RepresentationsStrengthens Cross+1

0 views

Computer Vision

ITBench-Lite: 65 Real-World IT Automation Scenarios for AI Agent Benchmarking

ITBench-Lite is a systematic framework for benchmarking large language models and AI agents on real-world IT automation tasks. The dataset contains 65 scenarios across three critical domains, including 35 scenarios for Site Reliability Engineering. It was created by IBM Research and is associated with a research paper titled 'ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks'.

TextIt AutomationLlm BenchmarkSite Reliability EngineeringAi Agents+1

0 views

Computer Vision

The Invention of the N-Back Task: A Historical Account

Michael J. Kane from the University of North Carolina at Greensboro authored this historical account. It details the invention of the n-back task, a test of working memory, based on material cut from the original manuscript by Kane, Conway, Miura, and Colflesh (2007). The dataset is an Open Access (diamond) publication shared via the paperswithcode platform.

TextN Back TaskWorking memoryCognitive ScienceResearch History+1

0 views

Computer Vision

Newquay and Gannel Marine Conservation Zone Survey Data from 2013

Newquay and the Gannel Marine Conservation Zone (MCZ) survey data was collected during a single cruise (EA_sngn0213) from March 21-31, 2013. The dataset includes 62 video transects, 293 analyzed still images, 41 particle size analysis (PSA) samples, and 41 infauna samples. It was aggregated by the Government Digital Service from the eu_open_data platform.

TabularGeospatialMarine conservationBenthic SurveyCoastal Ecology+1

0 views

Computer Vision

Yeast Display Macrocyclic Peptide Inhibitors of Human ACE2

2026 data from Zhanna Romanyuk details the discovery of low-nanomolar macrocyclic peptide inhibitors of human angiotensin-converting enzyme 2 (hACE2). The dataset includes results from screening millions of disulfide-cyclized peptide ligands using yeast display technology, identifying inhibitors with Ki values of 1.9 and 1.5 nM. It supports structural analysis of peptide binding modes distinct from previously reported inhibitors.

Larger Biologics DueConverting Enzyme 2Nanomolar InhibitorsQuantitatively Screening MillionsTwo MpsDesired Binding PropertiesTwoValuable Molecular FormatsFavorable Pharmacological PropertiesPotential TherapeuticsEncoded OneYeast DisplayVitroPreviously Reported InhibitorsValid TechnologyRigidStructurally Diverse DisulfideRingBridging Small Molecules+1

0 views

Computer Vision

Plant Fossils from Mitchell River and Mount Mulligan, North Queensland

Legacy product from Geoscience Australia with no abstract available. The dataset likely contains information on fossilized plant specimens collected from two specific locations in northern Queensland. It is published as PDF and HTML documents on the data.gov.au platform.

TextPaleobotanyQueensland AustraliaPlant FossilsGeoscience+1

0 views

Computer Vision

Ontario Post-Secondary Operating Grants by Program Type

2026 data from the Ontario Ministry of Advanced Education and Skills Development details operating grants to universities and publicly assisted colleges. It includes major grant types for basic operations, enrolment, northern institutions, French/bilingual programs, Aboriginal education, students with disabilities, first-generation students, and health human resource programs.

0 views

Computer Vision

MIQD-2.5M: 2.5 Million Degraded Images for Machine Vision Quality Assessment

A 2026 dataset from researchers at Sun Yat-sen University and Nanyang Technological University. It contains 2.5 million degraded images generated from 10,000 original images across three vision tasks: Image Classification, Object Detection, and Instance Segmentation. The dataset was created by applying 10 distortion types across 5 levels and 3 region patterns, with quality scores generated by 75 models.

ImageComputer VisionVision TasksLarge ScaleDegraded Images+1

0 views

Computer Vision

Bioprinted Pre-Term Infant Intestinal Stem Cells in a Defined Hydrogel

Aaron Fernandes published a dataset on 2026-04-17 from a proof-of-concept study demonstrating the bioprinting of intestinal stem cells derived from pre-term infant gut organoids. The dataset likely contains data on cell viability and phenotype retention after printing using a Reactive Jet Impingement (ReJI) technique. The cells were maintained in a collagen–alginate–fibrin (CAF) hydrogel.

ImageTabularStem CellsHydrogelBioengineeringOrganoidsBioprinting+1

0 views

Computer Vision

Archive 2020 Programme and Communication Calendar: Meetings and Consultations

The 2020 calendar documents meetings and consultations organized by the Archive 2020 programme. The dataset is an Excel file published by the Dutch Ministry of the Interior and Kingdom Relations on the EU Open Data portal. It also includes an overview of meetings organized by partners.

TabularEu Open DataAdministrative ArchiveCommunication Calendar+1

0 views

Computer Vision

Voluntary and Compulsory Contributions to International Organizations from EU Data

An overview of financial contributions made to international organizations, likely distinguishing between voluntary and compulsory payments. The dataset originates from the Dutch Ministry of the Interior and Kingdom Relations and is published via the EU Open Data portal. The specific time range, row count, and detailed column structure are currently unknown.

TabularGovernment SpendingInternational OrganizationsEu DataFinancial Contributions+1

0 views

Computer Vision

San Francisco Family Events and Activities from Multiple City Agencies

Our415.org consolidates current and upcoming events for children, youth, and families in San Francisco. The dataset is sourced from Rec Park's activities catalog, SF Public Library's events calendar, Department of Early Childhood's family events calendar, and Support for Families' family events calendar. It is updated daily by the City of San Francisco.

TabularCSVXMLJSONCity EventsFamily ActivitiesEvent CalendarSan Francisco+1

0 views

PreviousPage 222 of 780Next