Loading...
Loading...
3D models, rendered datasets, physics simulation, digital twins, synthetic data generation, game engine data
1,034 datasets
A processed version of the 3D-Front dataset, organized into 3D scenes paired with rendered multi-view images and surfaces for the MIDI-3D project. Each scene contains 3D models (.glb), point clouds (.npy), and rendered images in RGB, depth, and normal formats with camera information. The dataset was uploaded by author 'huanngzh' and last updated on 2025-06-25.
5,354 industrial product images for anomaly detection, originally from MVTec and provided by Voxel51. The dataset is formatted for loading with the FiftyOne library, facilitating computer vision tasks in manufacturing inspection.
A multi-camera synchronized video dataset rendered using Unreal Engine 5. It includes synchronized multi-camera videos and their corresponding camera poses. The dataset was released by KlingTeam and updated on April 15, 2025.
A source of multi-view orthographic renderings of a high-quality subset of the Objaverse collection, featuring 1024x1024 resolution images. Each entry includes four distinct modalities: RGB, Depth, Normal maps, and Camera parameters for 10 views per object.
A 10,000-prompt subset of the UFB dataset, translated into 9 languages, contains completions generated by 5 different teacher models and 2 aggregations. The dataset was created by CohereLabs and last updated on October 2, 2025. Completions were sampled from models including GEMMA3-27B-IT, kimik2, qwen3, deepseek-v3, and command-a.
Over 57 million simulated grasps computed for 8,515 objects from the Objaverse XL dataset. The dataset is specific to three gripper types: the Franka Panda, the Robotiq-2f-140, and a single-contact suction gripper. It was created by NVIDIA and released on Hugging Face in June 2025.
Northern Alaska ground surface roughness point clouds collected during the NASA SnowEx 2023 field campaign between 23 and 25 October 2022. The data are compiled from digital camera images acquired from 13 snow pits at the Upper Kuparuk and Toolik study site. The raw imagery is available as a separate dataset.
Allan Hills in Victoria Land, Antarctica, contains data on the structural and physical features of Ferrar dolerite dykes and sills. The dataset was created by SCIOPS, likely from field mapping, aerial photography, and sample collection conducted to understand magma flow dynamics. It was last updated on 2007-01-12.
Northern Alaska's Upper Kuparuk and Toolik site provided 13 snow pits for this collection of ground surface photographs. The images were captured with a digital camera during the NASA SnowEx 2023 field campaign from 23 to 25 October 2022. They were used to derive point cloud data for ground surface roughness analysis.
Over one million AI-generated images paired with high-quality text captions, primarily from DALL-E 3 with contributions from Stable Diffusion and Midjourney v5+. Created by ProGamerGov, the dataset was last updated in October 2024.
ShapeSplats provides a dataset of Gaussian splats derived from ModelNet40. The ModelNet_Splats subset contains 12 objects spanning 40 categories. Data is distributed as PLY files with Gaussian information encoded in custom vertex attributes.
InternRobotics released Scene-N1 in August 2025, providing a collection of 3D scene assets and Matterport3D scans for Vision-Language Navigation (VLN) research. The repository contains specialized environments for the VLN-N1, VLN-PE, and VLN-CE benchmarks, including residential and commercial scene categories.
CohereLabs provides synthetic text completions for the s1K-X training split prompts, generated by five different large language models. The dataset includes outputs from models like GEMMA3-27B-IT, KIMI-K2-INSTRUCT, and QWEN3-235B, sampled at a temperature of 0.3. This collection, last updated in October 2025, is designed for research into model aggregation and training data synthesis.
This curated repository, maintained by davanstrien and updated as of January 2026, serves as a central index for synthetic text datasets and generation tools. It aggregates resources specifically designed for training and evaluating large language models (LLMs) using artificially generated data. The collection is organized as an 'awesome-list' on GitHub, providing a directory of external links rather than a single unified file.
MARVEL-40M+ is a dataset for high-fidelity text-to-3D content creation, introduced in a CVPR 2025 paper. It appears to be a multi-level visual elaboration dataset, likely containing 3D assets and associated data. The dataset was released by authors including Sankalp Sinha and Mohammad Sadil Khan.
Objaverse contains over 800,000 annotated 3D objects released by the Allen Institute for AI in late 2022. It serves as a large-scale repository for 3D computer vision and generative modeling, aggregating diverse assets with associated metadata and creative commons licensing.
The grammar-correction dataset is a refined subset of the liweili/c4_200m dataset, derived from Google's C4_200M Synthetic Dataset for Grammatical Error Correction. It contains 100,000 training and 25,000 validation entries of sentence pairs where the input is ungrammatical and the output is grammatical. The dataset was authored by agentlans and last updated on 2024-12-29.
Juan José Lastra-DÃaz released the Half-Edge Semantic Measures Library (HESML) V1R5 in May 2024. This Java library implements numerous ontology-based semantic similarity measures and Information Content models for WordNet, SNOMED-CT, MeSH, and Gene Ontology. It enables reproducible word and concept similarity experiments without coding via an XML-based input format.
A 3D digital model and geo-narrative materials for the town of Westport, Washington, co-created by the University of Washington and the local community. The project includes imagery collected by RAPID staff in 2020 and visualizations for coastal hazards like tsunamis, sea level rise, and erosion. It is part of an ongoing partnership to inform local emergency preparedness and long-term planning.
PhysicalAI DigitalCousin Assets from NVIDIA provide a collection of 3D meshes, textures, and object metadata for simulated tabletop manipulation environments. These assets, including items like mugs, bottles, bowls, and containers, populate virtual scenes for robotic interaction and benchmarking. The dataset was last updated by NVIDIA in June 2025.