1 dataset file named mulberry_sft.json containing conversational data for fine-tuning the Mulberry model series. The data follows the ShareGPT format and is structured with a messages column to support multi-turn dialogue training via the LLaMA-Factory framework.
Use Cases
- Fine-tune large language models using the messages column to improve conversational instruction following.
- Train conversational agents by mapping the sharegpt formatted data to instruction-tuning pipelines.
- Benchmark model response generation against the provided dialogue sequences in the messages field.
- Standardize custom datasets for LLaMA-Factory by using the provided dataset_info.json configuration template.
Strengths
- Formatted in the sharegpt style for multi-turn conversational modeling.
- Contains a messages column representing dialogue history between users and assistants.
- Includes configuration metadata for direct use in LLaMA-Factory's dataset_info.json.
- Provides specific training instructions and configs for the Mulberry model series.