Sign in to view source links and access this dataset
Description
64,164 Chinese-language question-and-answer pairs formatted for supervised fine-tuning (SFT), created by user callofthenight1 and last updated on Hugging Face in April 2026. The dataset is split into 62,920 training and 1,244 validation samples, covering multiple subjects from the Chinese national college entrance exam (Gaokao).
Use Cases
Supervised fine-tuning (SFT) of Chinese LLMs based on structured instruction-output pairs.
Benchmarking model performance on Chinese academic exam questions across subjects like chemistry.
Training models for educational question-answering and tutoring systems.
Analyzing the reasoning patterns in model responses to complex, domain-specific prompts.
Strengths
64,164 total samples provide a substantial corpus for training.
Explicit 62,920/1,244 train-validation split supports model evaluation.