Sign in to view source links and access this dataset
Description
1,000,000 rows of persona embeddings derived from the first million entries of the nvidia/Nemotron-Personas-USA dataset. The embeddings were generated using the Qwen/Qwen3-Embedding-0.6B model, producing 1024-dimensional vectors for persona categories like professional, sports, and culinary. The dataset was created by user 'tantara' and was last updated on Hugging Face in May 2026.
Use Cases
Training or evaluating semantic search models based on persona attribute embeddings.
Fine-tuning language models for persona-specific text generation based on the embedded persona categories.
Conducting similarity analysis or clustering of demographic and interest-based profiles.
Building recommendation systems that leverage encoded cultural and professional background information.
Strengths
Contains 1,000,000 pre-computed embedding vectors, providing a substantial starting point for analysis.
Uses a specific embedding model (Qwen/Qwen3-Embedding-0.6B) with a defined dimension of 1024.
Derived from a known source dataset (nvidia/Nemotron-Personas-USA), suggesting a structured origin.
Limitations
Description metadata is limited; actual data quality and column semantics require manual inspection after download.
Represents only a subset (first 1M rows) of a larger 7M-row source dataset, which may limit completeness.
Column-level documentation is absent; field semantics for columns like 'persona' and 'cultural_background' must be inferred.
Provenance
Source
nvidia/Nemotron-Personas-USA
Collection Method
Embeddings computed with Qwen/Qwen3-Embedding-0.6B model.
Freshness
Last updated 2026-05-17 16:15:01; freshness should be verified.
Geography
USA (inferred from source dataset name)
License is unknown; users must verify terms of use before application.