Loading...
Loading...
3D models, rendered datasets, physics simulation, digital twins, synthetic data generation, game engine data
1,034 datasets
NVIDIA's PhysicalAI DigitalCousin Assets provide a collection of 3D meshes, textures, and object metadata for simulated tabletop manipulation environments. These digital assets, including mugs, bottles, bowls, and containers, populate virtual scenes for the GR1 robot. The dataset was published by NVIDIA and last updated in June 2025.
Data from 289 computer science students supports the article 'Are Virtual Reality Serious Video Games More Effective Than Web Video Games?'. The dataset includes pre-test and post-test scores and questionnaire results for two groups: 110 students using VR and 179 using a web format. Author López Fernández, Daniel published the data via e-cienciaDatos Harvested Dataverse, last updated in October 2025.
TransFrag27K is a large-scale dataset containing 27,000 images and masks at 640×480 resolution. It covers fragments of common everyday glassware, incorporating over 150 background textures and 100 HDRI environment lightings. The dataset was created by chenbr7 and last updated in August 2025.
The Habitat Synthetic Scenes Dataset (HSSD) contains 211 human-authored 3D scenes and over 18,000 models of real-world objects. It is designed to mirror real interiors more closely than prior synthetic datasets for consumption in the Habitat simulation platform.
An evaluation dataset for HTML generation models, covering diverse web development scenarios. The dataset was created by nex-agi and last updated on 2025-11-19. It includes examples for landing pages, e-commerce sites, responsive layouts, financial dashboards, WebGL scenes, Three.js applications, and browser-based games.
Datasets used in the paper 'Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis'. The repository includes ModelNet and SONN datasets, organized into subfolders for different corruption types. The dataset was uploaded by author 'auniquesun' and last updated on 2025-06-19.
Project Aria's Aria Everyday Activities (AEA) dataset provides recordings of daily activities from a first-person perspective. The description mentions it includes high-frequency 6DoF trajectories, observed point clouds, and synchronized RGB and monochrome camera views. The dataset was last updated on Hugging Face by projectaria on September 17, 2024.
Vertical and stratified zooplankton sampling was conducted during July-August 2001 aboard the Swedish icebreaker Oden. The program FAMIZ studied distribution and abundance in the Amundsen and Nansen basins. The data was collected by SCIOPS.
SmolTalk is a synthetic dataset containing 1 million samples created for supervised finetuning of large language models. It was developed by HuggingFaceTB to address performance gaps with public SFT datasets and was used to build the SmolLM2-Instruct model family. The dataset's methodology and details are documented in a research paper.
500 3D models in URDF format, split into 235 textured and 265 untextured versions, are provided by Behavision. The dataset is designed to support research in robotics simulation, grasping, and physics simulation. It was last updated on August 7, 2025.
KodCode is a fully-synthetic open-source dataset for coding tasks, created by KodCode and last updated on March 17, 2025. It contains 12 distinct subsets spanning domains from algorithmic to package-specific knowledge and difficulty levels from basic exercises to competitive programming. The dataset is designed for supervised fine-tuning and RL tuning.
SUM Parts is a benchmark dataset for part-level semantic segmentation of urban textured meshes. It covers 2.5 square kilometers and includes annotations for 21 classes such as terrain, vegetation, water, and building components. The dataset was created by author gwxgrxhyz and last updated on June 21, 2025.
MultiCamVideo is a synthetic dataset of synchronized multi-camera videos and corresponding camera trajectories rendered in Unreal Engine 5. Created by KlingTeam and released in 2025 alongside the ReCamMaster paper, it provides ground-truth spatial data for multi-view video research.
PartNet Archive contains 3D object data with part-level annotations, derived from the ShapeNet repository. The prerelease v0 from March 2019 includes meshes, point clouds, and HDF5 files for semantic and instance segmentation tasks. ShapeNet assembled this collection for research in fine-grained 3D shape understanding.
More than 5 million samples of supervised fine-tuning data used to train the Ling-Coder Lite model. The dataset is part of a larger collection that also includes DPO and synthetic QA subsets, created by inclusionAI and last updated on March 27, 2025.
2,516 synthetic question and answer sets focused on analog electronics, created from seven different perspectives. The dataset was generated by the author 'theprint' using a prompt designed to provide useful, inspiring, and appropriately detailed assistance. It was last updated on Hugging Face on August 12, -2025.
From May 10 to August 24, 1958, Canadian exploratory fishing vessels collected data on salmonids caught via gillnetting in the Northeastern Pacific Ocean. The dataset likely contains tabulated records for each fish, including length, weight, sex, maturity, and stomach contents, alongside fishing position data showing gear, depth, surface temperature, salinity, and oceanographic domain. The data was gathered by NOAA_NCEI.
Objaverse-Rand6View provides 1024x1024 multi-view renders including RGB, depth, and normal maps derived from a high-quality subset of the Objaverse repository. Created by huanngzh and released in late 2024, the collection features randomized orthographic and perspective views designed for 3D generative modeling.
Data from Christine Mayer's study applies a Geometric Morphometric Image Analysis (GMIA) method to larval and juvenile rainbow trout tail fin images. The dataset enables the joint quantitative analysis of embryo shape and spatial patterns of cellular activity. It was last updated in June 2020.
KodCode is a fully-synthetic open-source dataset providing verifiable solutions and tests for coding tasks. It contains 12 distinct subsets spanning domains from algorithmic to package-specific knowledge and difficulty levels from basic exercises to competitive programming. The dataset was created by KodCode and last updated in April 2025.