Name: Oasst1-21k-Ja: Japanese Instruction Tuning Dataset for LLMs
Creator: llm-jp
Published: 2024-01-28T23:27:03
Keywords: Japanese Nlp, Text, Llm Training, Translated Text

Description

LLM-jp, a collaborative project in Japan, provides this dataset. It is a Japanese translation of a 21,000-instruction English subset from the OASST1 dataset, created using the DeepL translation service. The dataset was last updated on February我们发现一个错误。根据输入，数据集标题是“Oasst1 21K Ja”，描述中提到它是“oasst1-21k-ja”，并说明是“Japanese translation of an English subset of oasst1”。因此，正确的摘要应基于此信息。输入中没有明确的行数“21,000”，但标题和名称暗示了“21k”。我将据此修正摘要。

Use Cases

Fine-tuning Japanese language models for instruction-following based on the translated instruction-response pairs.
Evaluating the performance of LLMs on Japanese conversational tasks.
Studying the effects of machine translation on instruction-tuning data quality.

Strengths

Dataset is specifically designed for Japanese language model instruction tuning.
Translation was performed using the DeepL service, which may imply a degree of quality control.
The source is a known instruction dataset (OASST1), providing a foundation for the content.

Limitations

Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Data may reflect translation bias or artifacts introduced by the machine translation process.

Provenance

Source: LLM-jp collaborative project.
Collection Method: Machine translation of an English subset from the OASST1 dataset using DeepL.
Time Range: null
Freshness: Last updated 2024-02-06 04:06:04; freshness should be verified.
Geography: Japan (by project origin and language focus).

License is unknown; users must verify terms before use.

Text Japanese Nlp Llm Training Translated Text

Oasst1-21k-Ja: Japanese Instruction Tuning Dataset for LLMs

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info