Loading...
Loading...
Image classification, object detection, segmentation, face recognition, OCR, image generation, video understanding
15,982 datasets
Four benchmark datasets contain images of chemical Markush structures from patents and their corresponding CXSMILES string representations. The largest subset, 'uspto-mol-m-54k-new', includes 54,785 training samples. The datasets were created by docling-project and were last updated in March 2026.
Datasets contain images of Markush chemical structures from patents paired with their CXSMILES string representations. The collection includes over 54,000 training samples from the USPTO-MOL-M source and multiple benchmark subsets for evaluation. The dataset was created by docling-project and was last updated in March 2026.
Water bottle cast data profiles were collected in the Arctic and North Atlantic Oceans from multiple platforms, including the R/V EVERGREEN, between July 1962 and October 1975. The dataset includes measurements for temperature, pH, salinity, and concentrations of nutrients and gases like phosphate, nitrate, oxygen, and chlorophyll a. Data were submitted by the Woods Hole Oceanographic Institute and are available in CSV format.
Worldwide Organic Soil Carbon and Nitrogen Data (1986) is a collection of soil sample analyses compiled by Paul J. Zinke of Oak Ridge National Laboratory. The dataset includes soil profile carbon and nitrogen content, bulk density, and site location data from California, Italy, Greece, Iran, Thailand, Vietnam, Amazonian areas, and U.S. forests. It was designed to estimate the size of soil organic carbon and nitrogen pools at equilibrium with natural soil-forming factors.
A dataset from NOAA and NASA EarthData supports a Nature Climate Change study on reef acidification. It contains bottom water temperature, salinity, pH, benthic cover, and dissolved inorganic carbon measurements. Data was collected during a 2014 cruise in the Northern Mariana Islands.
Natural-Language-Labeled Keypoint Graphs are designed for industrial object localization tasks. The dataset appears to combine visual keypoint detection with textual descriptions. Specific details on its size, creator, and creation date are not provided in the available metadata.
ATSSC provides tribunal support services through a single, integrated organization. The plan likely contains budgetary, staffing, and operational targets for the 2026-27 fiscal period. It was published by the Administrative Tribunals Support Service of Canada on March 16, 2026.
Diameter values from the CNN predictions data of Model 1 and Model 2. The dataset is small at 5.5 KB and was authored by Yazid Saif, last updated in March 2026. It is licensed under CC BY 4.0 and available on figshare.
A collection of social media posts from Twitter/X discussing the topic of 'nikah muda' (early marriage). The dataset is hosted on Kaggle, but its size, author, and update date are unknown. The content likely consists of text posts, comments, or discussions related to this cultural and social issue.
Discrete profile measurements of dissolved inorganic carbon, total alkalinity, water temperature, salinity, dissolved oxygen, and nutrients collected during the R/V Marion-Dufresne cruise in the Red Sea. The data was collected by NOAA_NCEI from October 3 to October 7, 1982.
Discrete profile measurements of dissolved inorganic carbon, total alkalinity, water temperature, salinity, dissolved oxygen, and nutrients collected during the R/V Marion-Dufresne cruise EXPOCODE 35MF19820626. The data was gathered in the Red Sea, Gulf of Aden, and Indian Ocean from June 26 to July 3, 1982, and is archived by NOAA's National Centers for Environmental Information.
Raw event logs from ERC-8004 Identity and Reputation Registry contracts, archived across all mainnet deployments. The dataset captures every IdentityRegistered and ReputationUpdated event emitted since each chain's contract deployment. It was created by author 'qntx' and last updated on March 25, 2026.
1009.9 MB of raw von Mises stress data from finite element models analyzing the bite biomechanics of the pterosaur Tupandactylus navigans. The dataset, authored by Richard Buchmann, supports the manuscript on this tapejarid species from the Lower Cretaceous Crato Formation of Brazil.
755,207 bytes of supporting data accompany the research 'Tracing Gangdese arc magmatic flare-up with Mo isotopes'. The dataset likely contains analytical methods and supplementary tables and figures for geochemical and isotopic analysis. Its primary purpose is to provide the detailed evidence underpinning a study of magmatic activity in the Gangdese arc.
The California Department of Transportation and California Energy Commission developed this map to identify eligible areas for deploying electric vehicle charging infrastructure under the federal National Electric Vehicle Infrastructure Program. It visualizes Alternative Fuel Corridors, needed charging locations, and various demographic and jurisdictional layers to guide a $384 million, five-year funding allocation.
GFPGAN_SAM is a dataset hosted on Kaggle. Its title suggests it relates to image restoration and segmentation tasks, likely involving the GFPGAN and SAM models. The dataset's specific content, size, and origin are not detailed in the available metadata.
YOLO_WITH_IMAGES is a dataset for object detection tasks, likely containing images and corresponding annotations. It is hosted on Kaggle, but its specific contents and scale are not detailed in the available metadata. The dataset's author, size, and creation date are unknown.
A June 6, 2024 briefing document prepared for the Canadian Minister of Transport, Pablo Rodriguez, by the Transportation Safety Board of Canada. The PDF presentation outlines the TSB's independent role and its cooperation with Transport Canada on investigations and safety recommendations. The dataset was last updated in March 2026.
94,428 records of EPA research products, including reports and presentations, were scraped from the Science Inventory and NEPIS databases. Produced by the EPA's Office of Research and Development, this archive includes metadata and downloaded documents organized by title and entry ID. The collection was last updated in March 2026.
GameplayQA is the first benchmark for POV-Synced Multi-Video Understanding and Multi-Agent Video tasks. The dataset contains video, text, and tabular modalities for gameplay understanding and visual question answering. It was created by researchers at the University of Southern California and published in 2026.