Loading...
Loading...
Image classification, object detection, segmentation, face recognition, OCR, image generation, video understanding
15,988 datasets
Corporate registration records from the District of Columbia's Department of Licensing and Consumer Protection. The dataset includes business entity details such as file number, entity status, business name, address, and report filing dates. It is maintained by the DC Corporations Division as the official Office of Corporate Registrar.
An Individual Participant Data Meta-Analysis synthesizing evidence from 67 datasets across 27 armed conflicts. The research, authored by Joan Barcelรณ and hosted on Harvard Dataverse, investigates the association between exposure to wartime violence and religiosity. It was last updated in March 2026.
Mingda Wang's 2026 meta-analysis dataset, 28.6 MB in size, compiles evidence on how ants influence soil carbon cycling and organic matter stability. The dataset, released under a CC-BY-4.0 license, likely contains tabular data from aggregated studies for statistical synthesis. It supports research into the role of soil fauna in biogeochemical processes.
Kaggle hosts a dataset titled 'yoloswinv2tnaver12'. The title suggests it is likely related to computer vision and object detection, potentially using a YOLO and Swin Transformer V2 architecture. The dataset's author, organization, and specific contents are unknown.
DISTRACTED DRIVER.v1i.yolov11 is a dataset published on Kaggle, likely containing images for detecting distracted driving behaviors. The title suggests the data is formatted for the YOLOv11 object detection model. Metadata is minimal; actual content requires verification after download.
ckp_GuwenBert_nomnaocr_repair_stage3_e1_30_v12 is a dataset published on Kaggle. The title suggests it is a Chinese language corpus, likely intended for training or fine-tuning BERT models. The specific versioning in the title indicates it may be part of a multi-stage processing pipeline.
A dataset of images likely preprocessed for optical character recognition tasks. The title suggests images have been resized to 512x512 pixels and processed with a sliding window technique. It is published on Kaggle, but the author, organization, and specific source details are unknown.
Ocr Synthetic Rendered 200K is a dataset published on HuggingFace by Jiwon-Kang. The title suggests it contains 200,000 synthetically rendered images likely intended for optical character recognition tasks. The dataset was last updated on 2026-05-06.
A meta-analysis dataset investigating the effects of ants on soil carbon cycling and organic matter stability. The dataset, code, and figures were published by Mingda Wang in 2026 and are available under a CC-BY-4.0 license. The data package is 27.2 MB in size and was last updated on March 21, 2026.
A suite of R packages for quantitative soil profile analysis, started in 2009 and developed over 8 years. The project, authored by Dylan Beaudette, organizes concepts and source code for soil profile visualization, aggregation, and classification. It has been applied to projects involving hundreds of thousands of soil profiles and is integrated into tools like SoilWeb.
23 fine-grained sound classes, grouped into 8 coarse classes, were annotated by citizen scientists for urban noise monitoring. The dataset contains 10-second audio recordings from over 50 sensors in the SONYC network, which has collectively gathered the equivalent of 37 years of audio data. A subset of this data, split into training and validation sets, was released in March 2019 by researchers from New York University and other institutions.
Sediment analysis results from Salt River Bay National Historical Park and Ecological Preserve in St. Croix, US Virgin Islands. The dataset includes measurements of organic contaminants like hydrocarbons and pesticides, inorganic contaminants like metals, and counts of the benthic infaunal community. It was produced by the National Oceanic and Atmospheric Administration and contains data collected from September 4 to September -6, 2018.
A dataset containing text from the Al Quran and Hadith, likely intended for artificial intelligence applications. The description mentions branches of AI such as machine learning, deep learning, and natural language processing, along with popular algorithms like CNNs. Its specific size, source, and update history are unknown.
YOLOEFFV2SShopee12 is a dataset published on Kaggle. The title suggests it is likely related to object detection using the YOLO model architecture, possibly for e-commerce applications. Specific details regarding its size, contents, and creation are not provided in the available metadata.
OS MasterMap Water Network Layer offers a detailed, heighted water network for Great Britain, showing the flow and precise course of rivers, streams, lakes, and canals. The dataset is produced by the Government Digital Service and will become 'end of life' on 31 March 2026. It includes features such as aqueducts, tunnels, flow direction annotations, and inferred underground watercourses.
National Park Service Geologic Resources Inventory data layers for Big South Fork National River and Recreation Area. The dataset includes GIS files in geodatabase, geopackage, and KMZ formats, supported by ArcGIS Pro, QGIS, and Google Earth. It was compiled from source maps by the Kentucky Geological Survey and U.S. Geological Survey spanning 1972 to 2016.
A 2026 publication from the Communications Security Establishment Canada clarifies actionable information types for reporting to the Canadian Centre for Cyber Security. It provides a structured framework for organizing and sharing details during a cyber incident.
March to April 2016 discrete profile measurements of dissolved inorganic carbon, total alkalinity, pH, temperature, salinity, oxygen, nutrients, and chlorofluorocarbons collected during the R/V Roger Revelle cruise along GO-SHIP Section I09N in the Indian Ocean. Data were gathered using CTD and Niskin bottle instruments and are provided by the National Oceanic and Atmospheric Administration.
Davis Strait oceanographic profiles collected by the R/V Knorr from September 1 to September 21, 2008. The dataset includes discrete measurements of CTD temperature, salinity, oxygen, nutrients, dissolved inorganic carbon, and total alkalinity. Data is provided by the National Oceanic and Atmospheric Administration.
Animal classes from the ImageNet-1K Mini dataset, which is a 100-class subset of the original ImageNet-1K. The data is split into training and testing sets. The dataset's author, organization, and specific size are not provided in the input metadata.