Loading...
Loading...
Image classification, object detection, segmentation, face recognition, OCR, image generation, video understanding
15,629 datasets
OCHA Digital Services maintains this controlled vocabulary of humanitarian organization types, updated as of March 2026. The data provides standardized categories and definitions sourced from ReliefWeb and the Grand Bargain framework. It is distributed in CSV and Google Sheet formats to support humanitarian data interoperability.
6,803 multi-turn Socratic dialogues covering elementary science topics for grades 1–6. This dataset was used to train SocratTeachLLM and published in the KELE paper (EMNLP 2025 Findings). An English translation is available as ulises-c/SocratDataset-EN.
68 deep mutational scanning datasets on antibody-antigen complexes contain approximately 324,000 non-redundant mutations and 36,541 non-redundant interface mutations. The dataset was curated by RosettaCommons and reorganized into Apache Parquet files for Hugging Face. The dataset page was last updated on 2026-05-04.
Over 350 seabed sediment samples were collected from Australia's western, northern, and eastern continental margins during federal government surveys from 2007 to 2014. The dataset includes parameters for organic matter source, concentration, and bioavailability, linking sediment properties to water column productivity.
Global ocean data from the Max Planck Institute for Meteorology provides mapped, gap-filled fields of dissolved inorganic carbon (DIC) in the water column. The dataset is a monthly climatology based on observations from 2004 through 2017, produced using a self-organizing map and feed-forward network (SOM-FFN) method extended to four dimensions. An ensemble mean from ten bootstrapping runs provides the final DIC field, with an ensemble spread representing methodological uncertainty.
Sidewalk Management System tracks inspections and violations for New York City sidewalks. The dataset includes columns for specific defect types like TRIP_HAZ and BROKEN, location identifiers like BBLID and ONSTNAME, and violation process dates like POST_DATE and VIssueDate. It is hosted by data.cityofnewyork.us and was last updated on 2026-04-03.
The dataset classifies groundwater iron staining risk as 'High risk' or 'Low risk' for the Perth region. It was developed by the Department of Water and Environmental Regulation for the 'Perth Groundwater Atlas (2nd Edition), 2004'. The data is derived from monitoring bores and delineates areas with elevated iron or manganese staining potential.
8,706 synthetic reasoning examples generated by the Claude Opus model during its development from version 4.6 to 4.7. The dataset was created by user 'angrygiraffe' and is hosted on Hugging Face. It was last updated on May 1, 2026.
A 5.5 KB XLS file contains runtime comparison data for the DPCNet and YOLO11n object detection models. The dataset, authored by Linfeng Jia and updated in March 2026, reports DPCNet's performance gains, including a 45% reduction in parameter count and [email protected] improvements of 2.0% and 5.1% on benchmark datasets.
ITBench-Lite is a systematic framework for benchmarking large language models and AI agents on real-world IT automation tasks. The dataset contains 65 scenarios across three critical domains, including 35 scenarios for Site Reliability Engineering. It was created by IBM Research and is associated with a research paper titled 'ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks'.
Michael J. Kane from the University of North Carolina at Greensboro authored this historical account. It details the invention of the n-back task, a test of working memory, based on material cut from the original manuscript by Kane, Conway, Miura, and Colflesh (2007). The dataset is an Open Access (diamond) publication shared via the paperswithcode platform.
Newquay and the Gannel Marine Conservation Zone (MCZ) survey data was collected during a single cruise (EA_sngn0213) from March 21-31, 2013. The dataset includes 62 video transects, 293 analyzed still images, 41 particle size analysis (PSA) samples, and 41 infauna samples. It was aggregated by the Government Digital Service from the eu_open_data platform.
2026 data from Zhanna Romanyuk details the discovery of low-nanomolar macrocyclic peptide inhibitors of human angiotensin-converting enzyme 2 (hACE2). The dataset includes results from screening millions of disulfide-cyclized peptide ligands using yeast display technology, identifying inhibitors with Ki values of 1.9 and 1.5 nM. It supports structural analysis of peptide binding modes distinct from previously reported inhibitors.
Legacy product from Geoscience Australia with no abstract available. The dataset likely contains information on fossilized plant specimens collected from two specific locations in northern Queensland. It is published as PDF and HTML documents on the data.gov.au platform.
2026 data from the Ontario Ministry of Advanced Education and Skills Development details operating grants to universities and publicly assisted colleges. It includes major grant types for basic operations, enrolment, northern institutions, French/bilingual programs, Aboriginal education, students with disabilities, first-generation students, and health human resource programs.
A 2026 dataset from researchers at Sun Yat-sen University and Nanyang Technological University. It contains 2.5 million degraded images generated from 10,000 original images across three vision tasks: Image Classification, Object Detection, and Instance Segmentation. The dataset was created by applying 10 distortion types across 5 levels and 3 region patterns, with quality scores generated by 75 models.
Aaron Fernandes published a dataset on 2026-04-17 from a proof-of-concept study demonstrating the bioprinting of intestinal stem cells derived from pre-term infant gut organoids. The dataset likely contains data on cell viability and phenotype retention after printing using a Reactive Jet Impingement (ReJI) technique. The cells were maintained in a collagen–alginate–fibrin (CAF) hydrogel.
The 2020 calendar documents meetings and consultations organized by the Archive 2020 programme. The dataset is an Excel file published by the Dutch Ministry of the Interior and Kingdom Relations on the EU Open Data portal. It also includes an overview of meetings organized by partners.
An overview of financial contributions made to international organizations, likely distinguishing between voluntary and compulsory payments. The dataset originates from the Dutch Ministry of the Interior and Kingdom Relations and is published via the EU Open Data portal. The specific time range, row count, and detailed column structure are currently unknown.
Our415.org consolidates current and upcoming events for children, youth, and families in San Francisco. The dataset is sourced from Rec Park's activities catalog, SF Public Library's events calendar, Department of Early Childhood's family events calendar, and Support for Families' family events calendar. It is updated daily by the City of San Francisco.