Loading...
Loading...
Image classification, object detection, segmentation, face recognition, OCR, image generation, video understanding
15,846 datasets
An Arabic-translated version of the PushT image dataset for robotics. The dataset provides high-quality Arabic instructions for complex manipulation tasks, enabling the training of localized robotics policies. It was created by hamzabouajila and last updated on 2026-04-21.
OCR-Markdown-Dense-200x is a synthetic dataset designed for dense document optical character recognition tasks. The dataset was created by author prithivMLmods and was last updated on April 21, 2026. It focuses on extracting structured HTML or Markdown representations from densely packed document pages.
A processed and reduced medical image segmentation benchmark covering 10 human organs. The dataset is derived from the Medical Segmentation Decathlon by converting volumetric NIfTI scans into serialized 2D RGB images with segmentation masks. It is provided in multiple resolution variants (244, 512) for easier use and was last updated on 2026-04-19.
BloodshotNet-Dataset is the official, large-scale aggregated dataset designed to train a YOLO-based blood detection model. The dataset was created by author 'petre-bit' and was last updated on Hugging Face in April 2026. It contains highly graphic and sensitive imagery, including simulated and real blood, serious injury, and surgical scenes.
Giving access to infrastructure indicators for Uganda compiled by the World Bank Group from specialized international agencies. It aggregates metrics across transport, energy, and telecommunications sectors, with the most recent update recorded in March 2026. The data is delivered in CSV format and serves as a centralized resource for national development tracking.
Giving access to health indicators for Uganda sourced from the World Bank, aggregating metrics from the UN Population Division, WHO, UNICEF, and UNAIDS. It covers health systems, disease prevention, reproductive health, nutrition, and population dynamics, with the latest update recorded in March 2026. The data is delivered in CSV format to support public sector health analysis.
Uganda environmental indicators covering forests, biodiversity, emissions, and pollution, curated by the World Bank Group. These tabular records provide a localized view of natural and man-made resource metrics, with the most recent update recorded in March 2026.
This dataset tracks energy production, use, dependency, and efficiency indicators for Uganda, compiled by the World Bank Group. It aggregates data from the International Energy Agency and the Carbon Dioxide Information Analysis Center through March 2026. The records provide a time-series view of national energy and mining development.
World Bank indicators for aid effectiveness in Uganda, focusing on poverty reduction, health, and education metrics. Maintained by the World Bank Group, the data was last updated in March 2026 and is provided in CSV format. It tracks the impact of international aid on the achievement of Millennium Development Goals within the country.
OverlayDataset is a large-scale vision-language collection of 499,249 images paired with dense object-level annotations, local prompts for objects, and global scene captions. It was created by dsrivastavv and last updated on Hugging Face in April 2026. The dataset is designed for training controllable image generation systems.
Annotation data for the Video-MME-v2 benchmark, containing 800 1080p MP4 video files and 3200 question-answer pairs stored in a Parquet file. The dataset was created by MME-Benchmarks and the repository was last updated in April 2026.
Four datasets contain simulated records for mineral requirements in different bovine groups. Each dataset holds 5,000 individual records for growing heifers, pregnant nulliparous cows, pregnant parous cows, and lactating cows. The data was created by Jean-Baptiste Daniel and published on figshare in April 2026.
4,648 annotated images of handwritten Marathi sentences in Devanagari script. The dataset is hosted on Kaggle and likely contains samples for training optical character recognition models. Its specific origin, collection method, and update history are not detailed in the provided description.
4,648 annotated Marathi printed sentence images in Devanagari script. The dataset is hosted on Kaggle. The author, organization, and last update date are unknown.
14 OCR models ranging from 0.9B to 8B parameters provided by uv-scripts as of March 2026. These scripts facilitate the conversion of image-based datasets into markdown format using HuggingFace Jobs and the UV package manager.
Atha Text Dataset is a sentiment classification resource for the Indonesian language, containing three sentiment classes. The dataset is authored by Bangkah and was last updated on April 13, 2026. Its intended purpose is for learning NLP pipelines and establishing experimental baselines, not for production benchmarking.
Mid-Atlantic continental shelf data collected from November 1976 to September 1977. The dataset contains water column physical and chemical measurements, including temperature, salinity, and dissolved oxygen, alongside benthic organism surveys with species abundance and biomass. Data were submitted by the Virginia Institute of Marine Science and processed by the National Oceanographic Data Center (NODC) into standard formats F014 and F132.
Moored instrument data from the Gulf of Mexico captures time-series measurements of ocean currents, water chemistry, phytoplankton, and zooplankton. The dataset was submitted by Texas A&M University as part of the Brine Disposal project, with collection occurring from 1979-08-30 to 1981-08-01. Data were processed by the National Oceanographic Data Center into standard formats including F005 for current meters and F028 for phytoplankton.
October 1977 to August 1979 data from the Gulf of Mexico Brine Disposal project includes current direction, chemical parameters, benthic organisms, and wind wave spectra. Data were collected via moored current meter casts and other instruments by Texas A&M University and processed by the National Oceanographic Data Center (NODC) into standard formats.
Gulf of Mexico data from January 1981 to July 1982 includes current direction, water chemistry, and benthic organism measurements from moored instruments. Data was submitted by Texas A&M University for the Brine Disposal project and processed by the National Oceanographic Data Center into standard formats. It contains time-series current meter data, physical and chemical water column parameters, and species-level benthic survey information.