Synthetic Household Populations for Agent-Based Models, Generated via LLM
by Michael Jones·Updated 3d ago
5.5 KB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
A sample-free framework uses a large language model to generate complete, household-structured populations from aggregate demographic data. The framework was evaluated across 109 countries and in case studies for Newcastle upon Tyne, UK, and Dar es Salaam, Tanzania. The dataset, shared by Michael Jones on figshare, is a 5.5 KB XLS file last updated in June 2026.
Use Cases
Calibrating agent-based transport models based on synthetic household demographics.
Simulating public health interventions based on plausible household compositions and age structures.
Stress-testing disaster management plans based on statistically aligned synthetic populations.
Benchmarking population synthesis methods based on the LLM-generated attribute combinations.
Strengths
Framework is LLM-agnostic and requires no model fine-tuning, lowering data requirements.
Achieved close statistical alignment on gender and household size marginals in a global evaluation covering 109 countries.
Generated populations are household-structured, providing coherence for agent-based modeling.
Limitations
Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
The 5.5 KB file size suggests a very limited scope, likely containing summary variables rather than a full population.
Provenance
Source
Michael Jones via figshare.
Collection Method
Generated by an LLM-based framework using iterative prompting guided by discrepancies between synthetic and target demographic distributions.
Freshness
Last updated 2026-06-02 17:33:59; freshness should be verified.
Geography
Global evaluation (109 countries), with specific case studies for Newcastle upon Tyne, UK, and Dar es Salaam, Tanzania.