Loading...
Loading...
3D models, rendered datasets, physics simulation, digital twins, synthetic data generation, game engine data
1,034 datasets
Two mountainous study sites in the Western U.S. provide fine-scale snow and land cover maps derived from Maxar WorldView-2 and WorldView-3 satellite images. The dataset includes binary snow maps and fractional snow-covered area (fSCA) maps at 30 m and 465 m resolutions, produced by the NSIDC_CPRD and released in 2019. Land cover maps classify features such as illuminated snow, shaded snow, vegetation, exposed surfaces, surface water, and clouds.
GLDAS-2.0 provides a series of land surface parameters simulated from the Noah Model 3.6, covering a period from January 1948 to December 2014. The dataset was reprocessed by NASA's Global Land Data Assimilation System and distributed by the GES DISC. It offers a temporally consistent series forced entirely with Princeton meteorological input data.
GLDAS-2.0 Noah Land Surface Model data provides a series of simulated land surface parameters, including soil moisture and energy fluxes, at a 1.0 x 1.0 degree spatial resolution every 3 hours. The dataset was produced by NASA's Global Land Data Assimilation System and covers a 66-year period from January 1948 to December 2014. It is archived and distributed by the NASA GES DISC.
MulSeT is a benchmark dataset designed to challenge multimodal large language models (MLLMs) on spatial reasoning tasks. It requires models to integrate information from two distinct viewpoints of a 3D scene to answer questions. The dataset was created by WanyueZhang and was last updated on the Hugging Face platform in November 2025.
Weiser, Hannah provides 11 terrestrial laser scanning (TLS) tree point clouds in .LAZ format, manually labeled into leaf and wood points. The dataset covers 7 different tree species and includes additional attributes like Reflectance and Amplitude. It is intended for training and validation of semantic segmentation algorithms, as referenced in EsmorΓs et al. 2023.
SpatialLM provides point clouds and 3D annotations for 12,328 indoor scenes and 54,778 rooms. Created by manycore-research in 2025, this synthetic dataset was developed by professional 3D designers for production-level indoor scene understanding.
InclusionAI created this dataset for annealing training of the Ling-Coder Lite model. It is a subset of synthetic data, part of a larger collection that includes over 5 million SFT samples and 250k DPO samples. The dataset was last updated on Hugging Face on March 27, 2025.
GlobalBuildingAtlas provides global coverage of building polygons, heights, and LoD1 3D models at the individual building level. Created by zhu-xlab and updated in October 2025, it serves as a unified source for 2D and 3D urban structural data.
83 object meshes collected from SketchFab for the paper 'Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models'. The dataset was manually canonicalized in terms of location, orientation, and scale and converted to .obj format with image textures. It was created by HyeonwooKim and last updated on March 20, 2025.
Medical Subject Headings (MeSH) is a hierarchically-organized terminology for indexing biomedical information, used for PubMed and other NLM databases. The U.S. Department of Health & Human Services produces and maintains the terminology in XML, ASCII, MARC 21, and RDF formats, with daily to annual update schedules for different file types.
16,900 synthetically generated prompts and responses focused on advanced AI reasoning, specifically utilizing the DeepSeek R1 0528 model. The content spans technical domains including MLOps, CUDA programming, diffusion models, and complex adaptive systems.
KodCode V1 is a fully-synthetic open-source dataset for coding tasks, containing 12 distinct subsets across domains like algorithmic and package-specific knowledge. It is designed for supervised fine-tuning and RL tuning, with difficulty levels ranging from basic exercises to competitive programming challenges.
14,202 high-quality texture images and HDRI environments sourced from ambientcg.com. The dataset includes materials such as fabric, metal, wood, stone, concrete, and nature elements, as well as HDRI skyboxes. It was uploaded by nyuuzyou on June 13, 2025.
A preliminary dataset scraped from Thingiverse and paired with English-language synthetic prompts, created by author redcathode and last updated on January 23, 2025. It aims to help in fine-tuning large language models for 3D modeling tasks using OpenSCAD. The synthetic prompts were generated by the Gemini-2.0-Flash-Exp model based on SCAD files and their descriptions.
IL3D is a large-scale dataset for indoor 3D scene generation, created by WenxuZhou. It consists of two main components: a 3D-FRONT asset library of furniture and objects and a supplementary HSSD asset library. The dataset was last updated on October 24, 2025.
51,300 unique 3D models across 55 common object categories. Each model is linked to a WordNet 3.0 synset, providing a structured semantic hierarchy for the 3D geometry.
340 community-contributed robotics datasets from 117 global contributors form this large-scale collection for embodied AI. It represents the second major release, expanding upon a previous version to support vision-language-action learning. The dataset was created by HuggingFaceVLA and last updated in November 2025.
Long Island Sound and the New York Bight coastal areas are covered by a compilation of published and unpublished sediment texture and contaminant data. The dataset provides an historical foundation with information collected between 1956 and 1997. The report was summarized by the USGS.
SCIOP's dataset provides monthly statistical summaries of surface ocean currents in seas adjacent to Japan from 1953 to 1994. The data, derived from GEK and ADCP instruments, is aggregated into 1-degree latitude/longitude grids. Each grid includes mean speed, mean direction, sample count, maximum/minimum current, and stability.
Vertical array summaries of statistical analyses of temperature data from serial station observations in seas adjacent to Japan. The data covers the period from 1906 to 1994, aggregated by month and depth on a 1-degree grid. The product includes mean values, sample counts, maximums, minimums, and standard deviations for selected months and standard oceanographic levels.