Sign in to view source links and access this dataset
Description
512 million synthetic persona records across 77+ countries and 39 languages, unified from 18 open-source datasets. The database is partitioned, compressed, and stored in a Parquet warehouse, optimized for querying. It was created by Kasher13 and last updated on March 29, 2026.
Use Cases
Simulate AI agent populations based on synthetic demographic and behavioral profiles.
Conduct market research analysis using a large-scale, culturally diverse persona database.
Perform cultural analysis across 77+ countries and 39 languages represented in the unified data.
Strengths
Contains over 512 million records, making it one of the largest synthetic persona collections.
Integrates data from 18 distinct open-source datasets into a single, query-ready Parquet warehouse.
Covers a broad international scope with personas from 77+ countries and 39 languages.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Last updated 2026-03-29 07:27:05; freshness should be verified.
Provenance
Source
Unified from 18 open-source datasets.
Collection Method
Merged into a single coherent Parquet warehouse.
Freshness
Last updated 2026-03-29 07:27:05.
Geography
77+ countries
License information is unknown and should be verified before use.