Sign in to view source links and access this dataset
Description
7,473 training and 1,319 testing samples of Chinese mathematical word problems, translated from the English GSM8K dataset. The dataset was created by the author 'meta-math' using GPT-3.5-Turbo with few-shot prompting and was last updated on December 4, 2023. It is intended for supervised fine-tuning and evaluation of models on mathematical reasoning in Chinese.
Use Cases
Fine-tuning language models for mathematical problem-solving based on the Chinese question-answer pairs.
Benchmarking model performance on grade-school level math in Chinese using the provided test split.
Studying the effectiveness of machine translation for creating educational datasets.
Developing educational tools for Chinese-language mathematics instruction.
Strengths
Provides a clear split of 7,473 samples for training and 1,319 for testing, facilitating model development and evaluation.
The dataset is a direct translation of the established GSM8K benchmark, offering a parallel resource for Chinese-language reasoning.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Last updated 2023-12-04 04:02:01; freshness should be verified.
Data may reflect translation bias inherent to the automated process using GPT-3.5-Turbo.
Provenance
Source
Translated from the GSM8K dataset (https://github.com/openai/grade-school-math/tree/master).
Collection Method
Translated by GPT-3.5-Turbo with few-shot prompting.
Freshness
Last updated 2023-12-04 04:02:01.
License is unknown and should be verified before use.