Loading...
Loading...
3D models, rendered datasets, physics simulation, digital twins, synthetic data generation, game engine data
1,034 datasets
SCIOP's dataset provides monthly statistical summaries of surface ocean currents in seas adjacent to Japan from 1953 to 1994. The data, derived from GEK and ADCP instruments, is aggregated into 1-degree latitude/longitude grids. Each grid includes mean speed, mean direction, sample count, maximum/minimum current, and stability.
BlockGen-3D is a large-scale dataset of voxelized 3D models with accompanying text descriptions, designed for text-to-3D generation tasks. The dataset was created by author PeterAM4, who processed and voxelized models from the Objaverse dataset to create a standardized representation suitable for training 3D diffusion models. It was last updated on January 9,我们发现了一个错误。
PartObjaverse-Tiny is a 3D part segmentation dataset providing detailed semantic-level and instance-level part annotations for 200 complex 3D objects. It was created by yhyang-myron and was last updated on December 13, 2024. The dataset includes mesh files and corresponding ground truth annotation files.
3dlg-hcvc provides mesh, point cloud, and metadata for two datasets used in the S2O research project. The PM-Openable subset contains 648 openable objects from PartNet-Mobility, with a train/val/test split of 460/95/93 objects. The Articulated Container Dataset (ACD) contains openable container objects sourced from HSSD.
The Describable Textures Dataset (DTD) is an evolving collection of textural images annotated with human-centric perceptual attributes. It is made available to the computer vision community for research purposes by the Visual Geometry Group at the University of Oxford. The dataset was last updated on the Hugging Face platform on 2023-05-11.
Over 10 million 3D objects form this open dataset, which is more than an order of magnitude larger than its predecessor. AllenAI released Objaverse-XL in 2023 to train the Zero123-XL foundation model for 3D tasks. The dataset is described as being much more diverse than the earlier 800K-object Objaverse 1.0.
Ling-Coder-DPO is a subset of 250,000 samples used for Direct Preference Optimization (DPO) training of the Ling-Coder Lite model. The dataset was created by inclusionAI and last updated on Hugging Face on March 27, 2025. It is part of a larger collection that also includes a supervised fine-tuning (SFT) subset with over 5 million samples and a synthetic question-answering subset.
A 2024 geospatial dataset from the German Federal Agency for Cartography and Geodesy. It contains development plans and surrounding areas for the Blumenstrasse Im Almeshofen site in the Herchenbach district of Puettlingen, Saarland. The data is provided as a Web Map Service (WMS) layer under a CC0-1.0 license.
An image dataset for training models in anime super-resolution, created by HikariDawn. The dataset is associated with a research paper and a Gradio demo. It was last updated on October 24, 2025.
Weighted average salinity outputs from two 31-day Delft3D Flexible Mesh simulations representing low and high discharge seasons in the Mississippi River Delta. The dataset, produced by ORNL_CLOUD and published via NASA EarthData, models conditions from fall and spring 2021. Data is provided in netCDF format, with each model's contribution weighted by the probability density function of Atchafalaya River discharge.
16,000 single-turn conversations form this synthetic dataset of instruction and refusal pairs. The dataset was created by author mrfakename and last updated on 2024-04 26. Human prompts are sourced from the Capybara dataset, with refusals generated synthetically.
Synthetic Models for Advanced, Realistic Testing: Distribution systems and Scenarios (SMART-DS) provides realistic large-scale U.S. electrical distribution models for three metropolitan areas: San Francisco (SFO), Greensboro (GSO), and Austin (AUS). The dataset contains detailed network models and connected time-series loads, validated against thousands of utility feeders for operational similarity. It is intended for powerflow simulations under various scenarios.
100,000 procedurally-generated indoor scenes comprise this synthetic dataset. It was created by projectaria for research on 3D scene understanding, object detection, and tracking, with a last update in September 2024. The dataset simulates sensor data matching the characteristics of Project Aria glasses.
ShapeNetSem is a subset of the ShapeNet repository, containing 3D models with rich physical attribute annotations. The archive is hosted by ShapeNet and was last updated on Hugging Face in September 2023. Users must agree to specific terms of use, restricting redistribution to research associates who also agree to the terms.
WebSight contains between 1 and 10 million pairs of synthetic website screenshots and their corresponding HTML/CSS code, released by HuggingFaceM4 in March 2024. The collection features two distinct versions covering standard HTML/CSS and modern HTML/Tailwind CSS implementations for English-language websites.
ShapeSplatsV1 is a dataset of 52,000 3D objects across 55 categories, derived from the ShapeNetCore repository. The data is distributed as PLY files where each Gaussian splat's information is encoded in custom vertex attributes. The dataset was created by ShapeNet and last updated on Hugging Face in September 2024.
MOCNESS trawl data captures zooplankton species abundance and biomass in the Gulf of Alaska from 1997 to 2004. The dataset, part of the Gulf of Alaska Long-Term Observation Program, was collected by SCIOPS using 1 meter-square nets with 5 mm mesh on oblique hauls. It provides a multi-year record for ecological analysis.
Plankton counts and taxonomic data were collected over a 59-year period from 1928 to 1987 using nets and traps on vessels worldwide. The Smithsonian Oceanographic Sorting Center compiled these records, which include gear specifications like net mouth diameter and mesh size. NOAA's National Centers for Environmental Information (NCEI) holds the dataset, which was submitted for archival in 1994.
A collection of RGB-D camera captures from 92 subjects changing their pose based on 10 markers. The dataset includes images, depth maps, rotation and translation matrices for registration, reconstructed 2K-point clouds, high-definition initial point clouds, and subject characterizations by age, gender, and ethnicity. It was authored by Marcos Quintana González and last updated in May 2024.
Featuring experimental data from a study examining depth perception in images of real scenes. The study manipulated pictorial depth cues, simulated dioptric blur, and binocular disparity, using light field photographs captured with a Lytro plenoptic camera capable of capturing images at up to 12 focal planes. Observers performed 2AFC tasks to indicate which of two patches extracted from these images was farther.