Sign in to view source links and access this dataset
Description
LLM-jp, a collaborative project in Japan, provides this dataset. It is a Japanese translation of a 21,000-instruction English subset from the OASST1 dataset, created using the DeepL translation service. The dataset was last updated on February我们发现一个错误。根据输入,数据集标题是“Oasst1 21K Ja”,描述中提到它是“oasst1-21k-ja”,并说明是“Japanese translation of an English subset of oasst1”。因此,正确的摘要应基于此信息。输入中没有明确的行数“21,000”,但标题和名称暗示了“21k”。我将据此修正摘要。
Use Cases
Fine-tuning Japanese language models for instruction-following based on the translated instruction-response pairs.
Evaluating the performance of LLMs on Japanese conversational tasks.
Studying the effects of machine translation on instruction-tuning data quality.
Strengths
Dataset is specifically designed for Japanese language model instruction tuning.
Translation was performed using the DeepL service, which may imply a degree of quality control.
The source is a known instruction dataset (OASST1), providing a foundation for the content.
Limitations
Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Data may reflect translation bias or artifacts introduced by the machine translation process.
Provenance
Source
LLM-jp collaborative project.
Collection Method
Machine translation of an English subset from the OASST1 dataset using DeepL.
Time Range
null
Freshness
Last updated 2024-02-06 04:06:04; freshness should be verified.
Geography
Japan (by project origin and language focus).
License is unknown; users must verify terms before use.