Sign in to view source links and access this dataset
Description
A synthetic Chinese persona memory dialogue dataset for the Xinhe project, which researches memory emergence in small Transformers. The dataset was created by author flufy3d and last updated on Hugging Face in May 2026. Each sample is a multi-turn Chinese dialogue containing user statements, corrections, and queries about a persona profile, with assistant responses and unrelated daily topics as distractions.
Use Cases
Training models for persona memory recall based on the user profile queries and corrections described.
Evaluating model robustness against conversational distractions based on the inclusion of unrelated daily topics.
Researching memory emergence in unified state architectures based on the dataset's stated purpose for the Xinhe project.
Developing token-level weighted loss functions based on the parser-generated auxiliary fields (value, value_span, value_tier, weight_per_span).
Strengths
Includes parser-generated auxiliary fields (value, value_span, value_tier, weight_per_span) for constructing token-level weighted loss.
Designed for a specific research goal: studying memory emergence in small Transformers within the Xinhe project.
Data generation process is described, involving LLMs (DeepSeek/OpenRouter) and a post-processing parser.
Limitations
Row count, file formats, and column details are unknown, which limits suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Data is entirely synthetic, generated by LLMs, which may introduce artifacts not present in human conversations.
Provenance
Source
Hugging Face, author flufy3d.
Collection Method
Synthesized by LLMs (DeepSeek/OpenRouter) following a specified skeleton, with post-processing by a parser.
Time Range
null
Freshness
Last updated 2026-05-05 13:41:48; freshness should be verified.
Geography
null
License is unknown; terms of use must be verified before application.