Gaokao SFT Chinese Large: 64,164 Chinese Exam Q&A Pairs for Instruction Tuning

Name: Gaokao SFT Chinese Large: 64,164 Chinese Exam Q&A Pairs for Instruction Tuning
Creator: callofthenight1
Published: 2026-04-16T22:20:48
Keywords: Chinese, Education, Text, Exams, Gaokao

by callofthenight1Updated 2mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

64,164 Chinese-language question-and-answer pairs formatted for supervised fine-tuning (SFT), created by user callofthenight1 and last updated on Hugging Face in April 2026. The dataset is split into 62,920 training and 1,244 validation samples, covering multiple subjects from the Chinese national college entrance exam (Gaokao).

Use Cases

Supervised fine-tuning (SFT) of Chinese LLMs based on structured instruction-output pairs.
Benchmarking model performance on Chinese academic exam questions across subjects like chemistry.
Training models for educational question-answering and tutoring systems.
Analyzing the reasoning patterns in model responses to complex, domain-specific prompts.

Strengths

64,164 total samples provide a substantial corpus for training.
Explicit 62,920/1,244 train-validation split supports model evaluation.
Structured fields (instruction, output, subject) suggest clear formatting for SFT workflows.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.
Data may reflect subject or source bias inherent to the Gaokao exam structure.

Provenance

Source: Hugging Face user callofthenight1
Collection Method: Likely compiled from Chinese Gaokao exam materials.
Time Range: null
Freshness: Last updated 2026-04-16 22:35:30; freshness should be verified.
Geography: China

License is unknown; terms of use must be verified before application.

Text Chinese Education Exams Gaokao

Related Datasets

Quality Score

D40

Description

42

Source

42

Reputation

41

Access

26

Community

40 downloads

1 likes

0 views

Dataset Info

Author: callofthenight1
Created: Apr 16, 2026
Updated: Apr 16, 2026
Last synced: May 4, 2026

Access

26

Community

40 downloads

1 likes

0 views

Dataset Info

Author: callofthenight1
Created: Apr 16, 2026
Updated: Apr 16, 2026
Last synced: May 4, 2026

Gaokao SFT Chinese Large: 64,164 Chinese Exam Q&A Pairs for Instruction Tuning

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info