Loading...
Loading...
Image classification, object detection, segmentation, face recognition, OCR, image generation, video understanding
15,903 datasets
Benthic organisms were collected from the Gulf of Mexico during a 1979 survey aboard the vessel EXCELLENCE. The data provide species abundance, distribution, and biomass information. Data were submitted by Texas A&M University with support from the Brine Disposal project.
Benthic organisms were collected from the Gulf of Mexico between December 1981 and August 1985. Data include species abundance, distribution, biomass, and associated environmental measurements from sediment sampler casts. Texas A&M University submitted the data with support from the Brine Disposal project.
NCEI Accession 7900332 contains data on benthic organisms collected in the Gulf of Mexico using sediment samplers and net casts. Texas A&M University submitted the data with support from the Brine Disposal project. Sampling occurred from May 22, 1978, to April 20, 1979, aboard the vessels GUS III and EXCELLENCE.
A 1997 study contains oxygen isotope analyses from ice cores drilled at the Big Dome Summit and other sites on the Collins Ice Cap, King George Island, Antarctica. The dataset includes 87 samples from a 0-13.96m depth interval and 30 additional samples from deeper intervals and other firn sites. Data collection was conducted by research groups at the University of New Hampshire and Nanjing University.
Images of concrete cracks form this dataset, created to support the development of a YOLOv11-based model for crack identification under high background interference. The dataset is associated with a 2026 research publication in Engineering Research Express and is provided by author Lin Wang. It is distributed as a single ZIP file with a size of approximately 27.7 MB.
Complete Data Source 100K Hours is a multilingual audio dataset containing approximately 100,000 hours of speech. It was uploaded by RidheshBhati to Hugging Face and last updated on April 17, 2026. The dataset is organized with a fixed schema to support audio playback across different language configurations.
A set of digital bathymetry, gravity, and magnetic grids for Australia's continental margin produced by the Australian Geological Survey Organisation in cooperation with Desmond Fitzgerald and Associates and the Australian Hydrographic Service. The grids have resolutions ranging from 250 to 1000 meters and represent a major upgrade of marine ship-track data for geological interpretation. Data from ship-tracks, satellites, and high-resolution onshore sources were integrated using levelling techniques to correct errors.
2,022 persons with confirmed cryptococcal meningitis were analyzed across three prospective cohorts spanning 2010 to 2022. The data, from the Uganda-Minnesota Research Collaboration, tracks changes in clinical presentation and mortality, revealing a less severely ill population in the most recent cohort with a 2-week mortality of 13% for trial participants. This dataset supports analysis of the evolving HIV and cryptococcal meningitis landscape following improvements in antiretroviral therapy access.
SRTM15_PLUS is a global elevation grid at 15 arcsecond resolution (~500 meters) fusing land, ice, and seafloor topography. The dataset integrates 494 million edited depth soundings with gravity model predictions from CryoSat-2 and Jason. It serves as the foundational bathymetry layer for Google Earth and is produced by NOAA.
Public opinion surveys from seven Asian and Western countries compare symbolic and operational ideology using both common cross-national items and country-specific items. Data was collected before and after national elections by researchers including Ikuma Ogura for a study published in Public Opinion Quarterly. The dataset supports analysis of ideological measurement stability, voter choice prediction, and attitudes toward democratic values.
10,000+ hours of cumulative robotic interaction data and over 1 million video clips for embodied intelligence research, released by genrobot2025 in early 2026. The collection features dual-arm robotic interactions with daily objects, including 35,000 manually sorted clips from the Stage 2 update. It focuses on real-world scenarios involving flexible, irregular, and various-sized objects in diverse storage environments.
Seed production and seedling survival data from a 50-year-old Corsican pine stand in southern Britain, collected over a 3-year period from February 2001 to March 2004. The study measured seed quantity and quality via seed rain and cone drop, and tracked seedling emergence and survival under different ground treatments. The data is associated with a Forestry journal article authored by Kerr, Gosling, Morgan, Stokes, Cunningham, and Parratt.
The dataset reports the number of allegations (Reasons) in investigation stages closed with specific service-related outcomes from fiscal year 2016 to 2025. It is published by the Texas Department of Family and Protective Services (DFPS) on the data.texas.gov platform. The data was last updated on February 10, 2026.
DialSeg-Ar is a multi-genre benchmark for linear semantic segmentation in Arabic, with a focus on dialectal conversational and transcribed speech. The dataset is designed to evaluate how well models can split a sequence of utterances into contiguous topic-coherent segments. It was created by MBZUAI and last updated on the platform in April 2026.
SOCCOM float deployment expedition NBP15_11 collected discrete bottle measurements of Dissolved Inorganic Carbon, Total Alkalinity, pH, temperature, salinity, oxygen, and nutrients in the Southern Ocean from 2015-12-06 to 2016-01-02. The National Oceanic and Atmospheric Administration published this dataset, which is part of the Ocean Observatories Initiative supported by NSF Cooperative Support Agreement OCE-1026342.
Geoscience Australia maintains a major collection of petroleum data from offshore wells. The collection includes well completion reports, logs, destructive analysis reports, vertical seismic profiles, and core photography. Data is sourced from industry submissions under legislative requirements and from government research projects and marine surveys.
13,371 line-level transcriptions verified by Qwen3-VL 235B form gold-standard OCR training data. The dataset includes line crop PNG images from 100 newspaper pages across 73 unique titles spanning the 1840s to the 2010s. It was created by NealCaren and is split by page into train, validation, and test sets.
Starrydata2 is a database containing experimental property data for inorganic materials. The dataset is a 51.0 MB ZIP file published on figshare by author Tomoya Mato in April 2026. It aggregates data from experiments in the field of materials science.
Kaggle hosts a dataset of over 10,000 images formatted for YOLO object detection. The images are categorized into seven distinct types of vehicle damage. The author, organization, and last update date are unknown.
Statistics Canada provides estimates of active non-profit organization counts, revenues, and employment segmented by rural and urban areas. The data is classified by the International Classification of Non-Profit Organizations (ICNPO) and geographic region. It was last updated in March 2026.