Loading...
Loading...
Image classification, object detection, segmentation, face recognition, OCR, image generation, video understanding
15,996 datasets
ASTER L1T data provides calibrated at-sensor radiance, geometrically corrected and rotated to a north-up UTM projection. The data is derived from the ASTER L1A product using a single resampling step and incorporates GLS2000 digital elevation data for precision terrain correction. Each granule is provided as three separate Cloud-Optimized GeoTIFF files corresponding to the VNIR (15m), SWIR (30m), and TIR (90m) sensors, with metadata for converting digital numbers to radiance and reflectance.
A dataset of solar panel images likely processed or generated using Generative Adversarial Networks (GANs). The dataset is hosted on Kaggle, but its exact size, creation date, and authorship are unknown. Columns and specific content details require verification after download.
DCLA Cultural Development Funding amounts are allocated per fiscal year to specific organizations. The data is hosted by data.cityofnewyork.us and was last updated on 2026-02-26. Columns suggest it tracks the Organization, Total Final Award, and Application # for each funded program.
1,258,453,709 unique documents form the Dolma3 6T training mix, selected by a Bloom filter built from 1.26B deduplicated IDs. The dataset is materialized from multiple sources including Common Crawl, Stack Exchange, and scientific PDFs, and was created by HCAI-Lab. It was last updated on March 14, 2026.
Epstein Files OCR — Datasets 1–8 (Early Release) contains page-level OCR output in Markdown format from a public release of documents related to the Jeffrey Epstein case. The dataset is designed for question answering, information retrieval, and text classification tasks. It was created by the author 'ishumilin' and last updated on March 17, 2026.
Closed discrimination case investigations from the Seattle Office for Civil Rights (SOCR) are tracked monthly from 2017 onward. The dataset shows completed cases categorized by type, such as race or disability. It is published by the City of Seattle and was last updated in March 2026.
Replication data for a forthcoming study in Political Research Quarterly. The dataset likely contains records linking economic performance indicators to the tenure of finance ministers across different political regimes. It was authored by Jonas Willibald Schmid and is hosted on Harvard Dataverse, with a last update recorded on April 20, -2026.
1992-1993 standardized study of humpback whales across their North Atlantic breeding and feeding grounds. The YoNAH project collected photographs of natural markings, genetic samples, and behavior data. The work was undertaken by SCIOPS, representing a broad-ranging, intensive study of a marine mammal species.
YOLO Trained Weights MTMCT is a dataset of pre-trained model weights for the YOLO object detection architecture, hosted on Kaggle. The weights are likely intended for tasks involving multi-target, multi-camera tracking scenarios. The dataset's specific content, size, and creation details require verification after download due to minimal provided metadata.
US region binary classification dataset for training AI-image detectors, built by Zitacron from 17 public HuggingFace sources. The corpus contains real and AI-generated images, all with commercial licenses.
Regions for allocating federal electric vehicle charging infrastructure funding under California's NEVI program Rounds 4 and 5. The data organizes the state into project regions by county, created by the State of California and last updated in March 2026.
Garbage_Detection_Dataset is a collection of images for object detection tasks, published on Kaggle. The raw description indicates it is a YOLO-ready dataset, which suggests it contains bounding box annotations for municipal waste and litter. The dataset's specific size, origin, and update history are not detailed in the provided metadata.
Quarterly observations from 1966 to 2017 track the names, locations, and key personnel of U.S. overseas diplomatic posts. David Lindsey compiled this micro-level dataset for the Measuring Diplomacy Project. It includes identifiers for top-ranking officers and post classifications, such as embassies, consulates, and missions.
A collection of pre-trained model weights for the DAMO-YOLO object detection architecture. Published on Kaggle, the dataset likely contains the parameter files necessary to initialize or fine-tune the model. The author, organization, and specific version or training details are unknown.
A dataset hosted on Kaggle containing a CycleGAN generator model. The title 'monet-cyclegan-g-p2m' suggests it is likely a pre-trained model for translating images from the style of Monet paintings to photographs. Its specific contents, such as the number of model parameters or training images, are not detailed in the provided metadata.
A 2007 pilot study by the University of Alaska Southeast determined baseline levels of persistent organic pollutants (POPs) and total mercury in juvenile coho salmon from streams near Glacier National Park. Concentrations of POPs were relatively low (< 10 ng/g, wet weight), but salmon from streams with higher spawner density showed increased levels of banned chlorinated pesticides. A follow-up study in 2015 analyzed POPs in resident salmonids and benthic macroinvertebrates from five streams using gas chromatography/mass spectrometry.
North Atlantic Ocean and North Pacific Ocean physical and chemical data collected via bottle casts from NOAA Ship MALCOLM BALDRIGE and other platforms between June 15, 1981 and April 25, 1994. The data were submitted by Dr. Richard A. Feely of the Pacific Marine Environmental Laboratory (PMEL). It includes measurements for temperature, salinity, oxygen, and various nutrients and carbon system parameters.
October 20 to November 15, 1981, saw the collection of discrete profile data during the R/V Jean Charcot MEDIPROD_IV cruise in the Western Mediterranean Sea. The dataset includes measurements of dissolved inorganic carbon, total alkalinity, temperature, salinity, dissolved oxygen, and nutrients. NOAA's National Centers for Environmental Information (NCEI) archives this data.
1964 geophysical and oceanographic observations from a cruise in the North American Basin of the Atlantic Ocean. The data includes magnetic measurements, echo sounding traverses, seismic refraction equipment checks, and studies of heat production in sediments, organic matter in sea water, and sediment samples. The cruise also occupied eight stations of the seasonal Halifax oceanographic section for the Atlantic Oceanographic Group.
1985 sediment coring activities at Davis Station, Antarctica, are documented in this scanned report. The report details the analysis of cores from the Vestfold Hills for water content, total organic content, and non-polar lipid content. It was produced by Lin Jian-ping and archived by the Australian Antarctic Data Centre.