GSM8K_zh_tw is a dataset for mathematical reasoning in Traditional Chinese, derived from the GSM8K_zh dataset. It contains 7,473 training samples and 1,319 testing samples, translated and regionally adapted for Traditional Chinese users. The dataset was created by DoggiAI and last updated on January 30, 2025.
Use Cases
- Fine-tuning language models for math word problem solving based on the question-answer pairs.
- Benchmarking model performance on mathematical reasoning in Traditional Chinese based on the test set.
- Studying the effect of language and regional adaptation on model performance based on the modified terminology.
Strengths
- Contains 7,473 training samples and 1,319 testing samples, providing a clear split for model development and evaluation.
- Includes modifications for regional adaptation, such as replacing some China-specific terms with those more suitable for Traditional Chinese users.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- DoggiAI
- Collection Method
- Derived from the GSM8K_zh dataset by translating question-answer pairs into Traditional Chinese using OpenCC.
- Freshness
- Last updated 2025-01-30 02:41:05; freshness should be verified.