Loading...
Loading...
3D models, rendered datasets, physics simulation, digital twins, synthetic data generation, game engine data
1,028 datasets
Synthetic e-commerce behavior data facilitates machine learning classification and feature engineering practice within a tabular format. Hosted on Kaggle, the dataset is structured for intermediate-level data cleaning and storytelling tasks. It lacks public documentation regarding specific record counts or column definitions.
Synthetic point cloud data across multimodal industrial categories. This dataset serves as the first installment of a series for 3D computer vision research in manufacturing environments.
Synthetic data designed for practicing people analytics workflows. The dataset contains artificial employee records for modeling HR scenarios without privacy concerns. Specific row counts, column details, and authorship are not provided.
Described as a security mesh for AI agents and is tagged for text classification tasks. The number of rows, columns, and specific data fields are unknown.
A synthetic dataset of 150,000 video frames annotated by GPT-4o for training frame sampling models. It features dense coverage, annotating approximately 20% of all frames with relevance scores, and provides fine-grained confidence assessments on a 1 to 5 scale. The dataset was created by author yaolily and last updated on September 4, 2025.
Synthetic Dataset is a dataset published on Kaggle. The dataset's content, size, and specific features are not described in the available metadata. Its creation method and intended application are inferred from its title and platform tag.
Proobjaverse 300K is a dataset published on huggingface by Stable-X, last updated on 2026-01 27. The title and platform tags suggest it contains a large collection of images, likely 300,000 items, for tasks related to image-to-image and image-to-3D processing. Its specific content, columns, and file formats are not detailed in the provided metadata.
Featuring 2000 Uzbek text messages labeled as 'spam' or 'normal'. It is designed for training spam detection models, with a split of 1800 training and 200 test samples.
MICS-Lab released a full dataset for the Novae project on December 10, 2025. The collection includes spatial transcriptomics samples used to train the Novae model, protein samples referenced in the associated article, and some Visium and Visium HD samples. It also contains synthetic data samples.
Snow depth maps and validation measurements from a 2018 intercomparison study of photogrammetric platforms in the Dischma valley, Switzerland. ENVIDAT provides this data set, which includes products from satellite, airplane, UAS, and terrestrial platforms. The study was conducted in spring 2018.
August 2002 data contains zooplankton samples from 10 stations in the Canada Basin, using 53 and 236 ยตm mesh nets. The database includes 1164 rows documenting 30 species, with analysis of numerical dominance and biomass contributions. It was collected by the organization SCIOPS for ocean exploration research.
Zooplankton samples were collected using a neuston net during four juvenile salmonid trawling cruises off the coasts of Oregon and California. The dataset, created by SCIOPS for the GLOBEC NEP Process Study, covers two sampling years, 2000 and 2002. Data includes genus/species-level identification with life stage and abundance information.
DensePose-COCO is a large-scale ground-truth dataset with image-to-surface correspondences manually annotated on COCO images. It contains 33,929 samples and was created by Voxel51. The dataset was last updated on the Hugging Face platform in June 2024.
A synthetic dataset likely related to mass customization processes. The dataset is hosted on Kaggle and is tagged as 'Synthetic'. Specific details on volume, features, creation method, and authorship are not provided in the metadata.
Water bottle samples collected from 14 stations in Florida during a November 1998 cruise provide counts and biochemical analysis of the harmful algae Karenia brevis. Coulter counts for the 14-28 um size class were determined, and isolated algae pellets were analyzed for total lipid, neutral lipid, free amino acids, protein, RNA, chlorophyll, and nitrate. The dataset was produced by Kamykowski's NCSU laboratory for NOAA NCEI.
Digital Twin PTB-XL is a dataset published on Kaggle. The dataset likely contains data for physics-based modeling and simulation, given the 'Digital Twin' concept in its title. Specific details regarding its size, origin, and creation date are not provided in the available metadata.
The dataset title 'NSD S1 train val NC selected voxels 70' suggests it contains data from the Natural Scenes Dataset (NSD) project. It likely includes selected voxel data from 70 subjects for training and validation splits. The data is hosted on Kaggle, but detailed metadata is unavailable.
NSD S2 likely contains processed neuroimaging data from the Natural Scenes Dataset. The dataset appears to be a subset of voxel data curated for machine learning training and validation purposes. Published on Kaggle, its specific content and scale require verification after download.
A synthetic dataset likely modeling customer interactions with add-on items in a food delivery cart, sourced from Kaggle. The dataset's specific size, creator, and temporal coverage are not provided in the metadata. Its content and structure must be verified after download.
NSD S6 val train NC selected voxels 70 is a dataset hosted on Kaggle. The title suggests it contains selected voxel data, likely from functional magnetic resonance imaging (fMRI), for training models related to the Natural Scenes Dataset. The specific content, scale, and origin require verification after download.