Loading...
Loading...
Image classification, object detection, segmentation, face recognition, OCR, image generation, video understanding
15,787 datasets
Seasonal oxidant influx modifies redox conditions, transforming high-molecular-weight humic substances into low-molecular-weight polar compounds and shifting microbial phosphorus metabolism pathways. This dataset integrates field monitoring, molecular dissolved organic matter characterization, and metagenomic analyses to elucidate the coupling between geogenic phosphorus, phosphorus-containing DOM, and microbial functional pathways in alluvial-lacustrine aquifers. It links dissolved inorganic phosphorus fluctuations to a degradation gradient of P-containing DOM and concurrent adjustments in microbial metabolism.
OCHA Afghanistan maintains this 3W (Who does What Where) dataset tracking humanitarian activities across districts and clusters. Updated through March 2026, the data identifies organizational presence to facilitate coordination and identify service gaps.
A four-year field experiment investigated the impact of climate change factors on soil organic carbon in a subtropical rice paddy. Data from this study illustrates significant declines in subsoil carbon under elevated CO2 and warming conditions. The dataset was authored by Xueli Ding and published via figshare in April 2026.
Bridge-CoT is a dataset of 35,357 samples for robot manipulation, derived from BridgeDataV2. Each sample pairs a scene image with a task description and includes structured VLM-generated annotations for object detection, spatial relations, and subgoal decomposition. The dataset was created by CliffKai and was last updated on Hugging Face in April 2026.
A cleaned and corrected version of the Tobacco3482 document image classification dataset, addressing significant labeling errors from the original source. The dataset was uploaded by user anirudh1112 to Hugging Face and was last updated on 2026-04-23. It integrates corrections from the research community to provide a higher standard for model evaluation.
Cleaned labels derived from the 'mychen76/invoices-and-receipts_ocr_v1' dataset. The dataset was created by user sharvinmalshe and was last updated on Hugging Face on 2026-05-24. It likely contains processed text extracted from scanned receipts and invoices.
Cole Lowman's study analyzes how environmental organizations in Buffalo, NY, signal gender inclusivity through pronoun usage on their websites. The dataset, last updated in March 2026, is a 675.2 KB document containing findings from a website review using Critical Signaling Theory. It reports that only 6.9% of analyzed websites included pronouns in staff bios, with inconsistent usage.
38 civic organizations across 10 U.S. states are tracked in this individual-level dataset from the Civic Power Lab. It measures participation, leadership development, and political influence over time to study the gap between civic engagement and governing power. The data were collected under agreements with Harvard Kennedy School and last updated in March 2026.
75,285 samples of images paired with multiple-choice question-answer items, forming a training dataset for the CapRL-3B image captioning model. The dataset was created by internlm and was last updated on April 16, 2026. It is designed for a two-stage training objective where caption quality is evaluated through the answerability of visual questions.
Requests for datasets on the Edmonton open data platform have been tracked since automated intake began on January 26, 2016. The dataset is updated daily at 6:30 am by the data.edmonton.ca organization. It contains records of public requests for data, including their description, status, and assigned department.
A collection of transcripts from the MSNBC news network, spanning approximately 2003 to 2022. The dataset includes about 16,000 transcripts from 2003-2014 and a more recent scrape covering 2010-2021. It was authored by Gaurav Sood and is hosted on Harvard Dataverse.
26 headwater catchments across four regions in Quebec's boreal and boreal-arctic transition zone provide dissolved organic carbon concentrations and composition data. Adrien Simonet compiled this dataset from terrestrial and aquatic compartments during summer sampling campaigns from 2021 to 2024. The data spans a significant latitudinal gradient from 48.9°N to 59.1°N, covering the La Romaine, Eastmain, Peribonka, and George River watersheds.
100,056 rasterized page images from arXiv AI/ML papers serve as a benchmark corpus for OCR tasks. The dataset, created by obswork, contains pages rendered at 144 DPI from 4,866 source PDFs and was last updated on 2026-04-19. Images are encoded as WebP and packed into Parquet shards for automatic decoding via Hugging Face datasets.
A longitudinal panel of 503 companies covering 70 years of workforce evolution, designed to train AI systems on organizational and human capital dynamics. This sample is drawn from the full Vivameda universe of 4.2 million companies and 48 million company-year observations. The dataset was created by Vivameda and last updated in April 2026.
International Aid Transparency Initiative (IATI) provides this CSV dataset of active humanitarian and development aid activities in the Democratic People's Republic of Korea. Updated as of March 2026, the data tracks ongoing projects and organizational involvement within the country.
A collection of qualitative data from four virtual focus groups conducted between June 2023 and March 2024 with 30 Latino men who have sex with men (LMSM) in Los Angeles, CA. The data was collected to develop community-informed recommendations for a culturally responsive long-acting injectable PrEP (LAI PrEP) awareness campaign. The file is an XLSX spreadsheet of 14.4 KB, but the specific number of rows and columns is unknown.
Culgoora Solar Observatory in Australia provides daily optical and radio observations of the sun. The dataset includes hydrogen-alpha filter images for solar flares and a radiospectrograph sweeping 18-1800 MHz every three seconds to monitor solar radio bursts. Data is collected by IPS Radio and Space Services (IPSRSS) as part of continuous, year-round operations.
A collection of qualitative interview data from a study of Peer-supported Open Dialogue (POD) practices for severe mental illness. It includes insights from 13 clients and one relative, supplemented by data from five additional client conversations. The analysis identifies core building blocks for recovery-oriented care, such as promoting self-determination and facilitating collaboration.
Replication data for the academic study 'Moral High Ground and the RIG Specification: A Check on State Propaganda'. The dataset was authored by Abdulaziz Almuslem and is hosted on the Harvard Dataverse platform. It was last updated on May 28, 2026.
Department of Youth and Community Development (DYCD) contracts detail funding amounts and registration information for youth and community service providers in New York City. Each row represents a contract-fiscal year pair, showing annual and total funding. The dataset is published by data.cityofnewyork.us and was last updated in March 2026.