DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Oasst2 33K Ja: Japanese Instruction-Tuning Dataset for LLMs | DataSalon

Home Multimodal & LLMOasst2 33K Ja: Japanese Instruction-Tuning Dataset for LLMs

Multimodal & LLM

Oasst2 33K Ja: Japanese Instruction-Tuning Dataset for LLMs

Name: Oasst2 33K Ja: Japanese Instruction-Tuning Dataset for LLMs
Creator: llm-jp
Published: 2024-04-28T16:24:00
Keywords: Japanese Nlp, Text, Llm Training, Translated Text

by llm-jp·Updated 2y ago

Available on 1 platform

Description

LLM-jp provides a Japanese instruction-tuning dataset containing 33,000 entries. The dataset is a Japanese translation of a subset from the English OASST2 dataset, processed using DeepL. It was created by the LLM-jp collaborative project and last updated on April 28, 2024.

Use Cases

Fine-tuning Japanese language models for instruction-following based on the translated instruction-response pairs.
Benchmarking model performance on Japanese conversational tasks using the structured prompts.
Studying the effects of machine translation on instruction-tuning data quality for non-English languages.

Strengths

Contains 33,000 Japanese instruction-response pairs for model training.
Data provenance is documented, being a translation of a known English subset (OASST2).
Created by a named collaborative project (LLM-jp), suggesting organized development.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is known but specific data features and structure are not detailed in the provided metadata.
Data may reflect translation bias inherent to the use of DeepL for processing.

Provenance

Source: Translated from an English subset of the OASST2 dataset.
Collection Method: Machine translation using DeepL, processed from kunishou/oasst2-135k-ja.
Freshness: Last updated 2024-04-28 16:39:03.

License is unknown; terms of use must be verified before application.

Text Japanese Nlp Llm Training Translated Text

Related Datasets

Quality Score

D35

Description

Source

Reputation

Quality Score

D35

Description

Source

Reputation

Access

Community

336 downloads

13 likes

0 views

Dataset Info

Author: llm-jp
Created: Apr 28, 2024
Updated: Apr 28, 2024
Last synced: May 27, 2026

Access

Community

336 downloads

13 likes

0 views

Dataset Info

Author: llm-jp
Created: Apr 28, 2024
Updated: Apr 28, 2024
Last synced: May 27, 2026

Oasst2 33K Ja: Japanese Instruction-Tuning Dataset for LLMs

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info