DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Computer Graphics & Simulation Datasets | DataSalon

All Categories

🎨

Computer Graphics & Simulation

3D models, rendered datasets, physics simulation, digital twins, synthetic data generation, game engine data

1,034 datasets

WorldView Satellite Land Cover and Snow Maps for Western U.S. Mountains

Two mountainous study sites in the Western U.S. provide fine-scale snow and land cover maps derived from Maxar WorldView-2 and WorldView-3 satellite images. The dataset includes binary snow maps and fractional snow-covered area (fSCA) maps at 30 m and 465 m resolutions, produced by the NSIDC_CPRD and released in 2019. Land cover maps classify features such as illuminated snow, shaded snow, vegetation, exposed surfaces, surface water, and clouds.

ImageGeospatialSatellite ImagerySnow CoverMountainous RegionsLand Cover Classification+1

0 views

Computer Graphics & Simulation

Global Land Surface Parameters from Noah Model at 3-Hourly Resolution

GLDAS-2.0 provides a series of land surface parameters simulated from the Noah Model 3.6, covering a period from January 1948 to December 2014. The dataset was reprocessed by NASA's Global Land Data Assimilation System and distributed by the GES DISC. It offers a temporally consistent series forced entirely with Princeton meteorological input data.

Time SeriesGeospatialLand Surface ModelHydrologyEarth ScienceClimate ModelingSynthetic+1

0 views

Computer Graphics & Simulation

Global Land Surface Parameters from Noah Model at 1-Degree Resolution

GLDAS-2.0 Noah Land Surface Model data provides a series of simulated land surface parameters, including soil moisture and energy fluxes, at a 1.0 x 1.0 degree spatial resolution every 3 hours. The dataset was produced by NASA's Global Land Data Assimilation System and covers a 66-year period from January 1948 to December 2014. It is archived and distributed by the NASA GES DISC.

Time SeriesGeospatialLand Surface ModelHydrologyClimate SimulationSynthetic+1

0 views

Computer Graphics & Simulation

MulSeT: A Multi-view Spatial Understanding Benchmark

MulSeT is a benchmark dataset designed to challenge multimodal large language models (MLLMs) on spatial reasoning tasks. It requires models to integrate information from two distinct viewpoints of a 3D scene to answer questions. The dataset was created by WanyueZhang and was last updated on the Hugging Face platform in November 2025.

MultimodalSpatial ReasoningLanguageenModalityimageBenchmarkComputer VisionArxiv250902359RegionusLicenseapache 20Synthetic DataVisual Question Answering+1

0 views

Computer Graphics & Simulation

Manually Labeled TLS Tree Point Clouds for Leaf-Wood Separation, 11 Scans

Weiser, Hannah provides 11 terrestrial laser scanning (TLS) tree point clouds in .LAZ format, manually labeled into leaf and wood points. The dataset covers 7 different tree species and includes additional attributes like Reflectance and Amplitude. It is intended for training and validation of semantic segmentation algorithms, as referenced in Esmorís et al. 2023.

Point CloudSemantic SegmentationForestryComputer VisionTerrestrial Laser ScanningLeaf Wood Separation+1

0 views

Computer Graphics & Simulation

SpatialLM: 12,328 Indoor Scenes and 54,778 Annotated Rooms

SpatialLM provides point clouds and 3D annotations for 12,328 indoor scenes and 54,778 rooms. Created by manycore-research in 2025, this synthetic dataset was developed by professional 3D designers for production-level indoor scene understanding.

CSVModality3dLibrarypolarsModalitytextSize Categories100 Kn1 MModalitytabularLibrarymlcroissantLibrarydatasetsLibrarypandasLicensecc By Nc 40Arxiv250607491Regionus+1

0 views

Computer Graphics & Simulation

Ling-Coder-SyntheticQA: Synthetic Code Generation Questions and Answers

InclusionAI created this dataset for annealing training of the Ling-Coder Lite model. It is a subset of synthetic data, part of a larger collection that includes over 5 million SFT samples and 250k DPO samples. The dataset was last updated on Hugging Face on March 27, 2025.

TextAi TrainingCode GenerationLarge ScaleSynthetic DataSynthetic+1

0 views

Computer Graphics & Simulation

GlobalBuildingAtlas: Global 2D Polygons and 3D LoD1 Building Models

GlobalBuildingAtlas provides global coverage of building polygons, heights, and LoD1 3D models at the individual building level. Created by zhu-xlab and updated in October 2025, it serves as a unified source for 2D and 3D urban structural data.

LanguageenLicenseodblRegionusDoi1057967hf6771+1

0 views

Computer Graphics & Simulation

ComAsset: 83 Canonicalized 3D Object Meshes for Affordance Research

83 object meshes collected from SketchFab for the paper 'Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models'. The dataset was manually canonicalized in terms of location, orientation, and scale and converted to .obj format with image textures. It was created by HyeonwooKim and last updated on March 20, 2025.

Point CloudAffordance3d ObjectsComputer VisionSketchfabMesh Data+1

0 views

Computer Graphics & Simulation

Medical Subject Headings for Biomedical Indexing and Cataloging

Medical Subject Headings (MeSH) is a hierarchically-organized terminology for indexing biomedical information, used for PubMed and other NLM databases. The U.S. Department of Health & Human Services produces and maintains the terminology in XML, ASCII, MARC 21, and RDF formats, with daily to annual update schedules for different file types.

ApiHealth Data StandardsTerminologies+1

0 views

Computer Graphics & Simulation

Mitakihara Deepseek R1 0528

16,900 synthetically generated prompts and responses focused on advanced AI reasoning, specifically utilizing the DeepSeek R1 0528 model. The content spans technical domains including MLOps, CUDA programming, diffusion models, and complex adaptive systems.

CSVSize Categories10 Kn100 KTask Categoriestext GenerationMachine LearningLibrarypolarsLanguageenConversationalChatMitakiharaModalitytextLibrarymlcroissantCompsciDoi1057967hf6135LibrarydatasetsChat InstructLibrarypandasArtificial IntelligenceRegionusLicenseapache 20Synthetic+1

0 views

Computer Graphics & Simulation

Synthetic Coding Tasks with Verifiable Solutions and Tests

KodCode V1 is a fully-synthetic open-source dataset for coding tasks, containing 12 distinct subsets across domains like algorithmic and package-specific knowledge. It is designed for supervised fine-tuning and RL tuning, with difficulty levels ranging from basic exercises to competitive programming challenges.

ParquetLibrarypolarsLibrarydaskLanguageenModalitytextSize Categories100 Kn1 MCodeModalitytabularLibrarymlcroissantLibrarydatasetsLicensecc By Nc 40RegionusArxiv250302951+1

0 views

Computer Graphics & Simulation

AmbientCG: 14,202 High-Quality Texture Images and HDRI Environments

14,202 high-quality texture images and HDRI environments sourced from ambientcg.com. The dataset includes materials such as fabric, metal, wood, stone, concrete, and nature elements, as well as HDRI skyboxes. It was uploaded by nyuuzyou on June 13, 2025.

ImageHdriMaterialsTextures3d RenderingComputer Graphics+1

0 views

Computer Graphics & Simulation

Thingiverse OpenSCAD: 3D Model Files Paired with Synthetic Prompts

A preliminary dataset scraped from Thingiverse and paired with English-language synthetic prompts, created by author redcathode and last updated on January 23, 2025. It aims to help in fine-tuning large language models for 3D modeling tasks using OpenSCAD. The synthetic prompts were generated by the Gemini-2.0-Flash-Exp model based on SCAD files and their descriptions.

Text3d modelingComputer Aided DesignOpenscadLlm Fine TuningSynthetic DataSynthetic+1

0 views

Computer Graphics & Simulation

IL3D: Indoor Layout Dataset for LLM-Driven 3D Scene Generation

IL3D is a large-scale dataset for indoor 3D scene generation, created by WenxuZhou. It consists of two main components: a 3D-FRONT asset library of furniture and objects and a supplementary HSSD asset library. The dataset was last updated on October 24, 2025.

MultimodalAsset LibraryIndoor LayoutsLarge Scale3d Scene GenerationComputer Graphics+1

0 views

Computer Graphics & Simulation

ShapeNetCore: Densely Annotated 3D Object Models

51,300 unique 3D models across 55 common object categories. Each model is linked to a WordNet 3.0 synset, providing a structured semantic hierarchy for the 3D geometry.

3 D ShapesLicenseotherLanguageenRegionusArxiv151203012+1

0 views

Computer Graphics & Simulation

Community Dataset V2: 340 Robotics Datasets for Vision-Language-Action Learning

340 community-contributed robotics datasets from 117 global contributors form this large-scale collection for embodied AI. It represents the second major release, expanding upon a previous version to support vision-language-action learning. The dataset was created by HuggingFaceVLA and last updated in November 2025.

ImageTextMultimodalMultilingualCommunity SourcedRoboticsComputer VisionLarge ScaleVision Language Action+1

0 views

Computer Graphics & Simulation

Contaminated Sediment Records for Long Island Sound and New York Bight

Long Island Sound and the New York Bight coastal areas are covered by a compilation of published and unpublished sediment texture and contaminant data. The dataset provides an historical foundation with information collected between 1956 and 1997. The report was summarized by the USGS.

TabularAudioGeospatialCoastal ecosystemsEnvironmental HistorySediment AnalysisMarine Pollution+1

0 views

Computer Graphics & Simulation

Monthly Ocean Current Statistics for Japan Seas, 1953-1994

SCIOP's dataset provides monthly statistical summaries of surface ocean currents in seas adjacent to Japan from 1953 to 1994. The data, derived from GEK and ADCP instruments, is aggregated into 1-degree latitude/longitude grids. Each grid includes mean speed, mean direction, sample count, maximum/minimum current, and stability.

TabularAudioTime SeriesGeospatialMarine DataMarine ScienceSurface CurrentsHistorical DataStatisticsGeospatial StatisticsOcean Currents+1

0 views

Computer Graphics & Simulation

Monthly Temperature Statistics for Japan Seas, 1906-1994

Vertical array summaries of statistical analyses of temperature data from serial station observations in seas adjacent to Japan. The data covers the period from 1906 to 1994, aggregated by month and depth on a 1-degree grid. The product includes mean values, sample counts, maximums, minimums, and standard deviations for selected months and standard oceanographic levels.

TabularTime SeriesGeospatialOceanographyTemperatureStatistics+1

0 views

PreviousPage 46 of 52Next