Name: Xinhe Dataset: Synthetic Chinese Persona Memory Dialogue for Transformer Research
Creator: flufy3d
Published: 2026-04-26T23:17:34
Keywords: Transformer Research, Llm Synthetic, Persona Memory, Text, Chinese Dialogue

Description

A synthetic Chinese persona memory dialogue dataset for the Xinhe project, which researches memory emergence in small Transformers. The dataset was created by author flufy3d and last updated on Hugging Face in May 2026. Each sample is a multi-turn Chinese dialogue containing user statements, corrections, and queries about a persona profile, with assistant responses and unrelated daily topics as distractions.

Use Cases

Training models for persona memory recall based on the user profile queries and corrections described.
Evaluating model robustness against conversational distractions based on the inclusion of unrelated daily topics.
Researching memory emergence in unified state architectures based on the dataset's stated purpose for the Xinhe project.
Developing token-level weighted loss functions based on the parser-generated auxiliary fields (value, value_span, value_tier, weight_per_span).

Strengths

Includes parser-generated auxiliary fields (value, value_span, value_tier, weight_per_span) for constructing token-level weighted loss.
Designed for a specific research goal: studying memory emergence in small Transformers within the Xinhe project.
Data generation process is described, involving LLMs (DeepSeek/OpenRouter) and a post-processing parser.

Limitations

Row count, file formats, and column details are unknown, which limits suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Data is entirely synthetic, generated by LLMs, which may introduce artifacts not present in human conversations.

Provenance

Source: Hugging Face, author flufy3d.
Collection Method: Synthesized by LLMs (DeepSeek/OpenRouter) following a specified skeleton, with post-processing by a parser.
Time Range: null
Freshness: Last updated 2026-05-05 13:41:48; freshness should be verified.
Geography: null

License is unknown; terms of use must be verified before application.

Text Transformer Research Llm Synthetic Persona Memory Chinese Dialogue

Xinhe Dataset: Synthetic Chinese Persona Memory Dialogue for Transformer Research

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info