Name: Llava Instruction Tuning Data for Vision Language Models
Creator: HuggingFaceH4
Published: 2024-04-10T14:15:23
Keywords: Librarypolars, Librarydask, Modalitytext, Size Categories100 Kn1 M, Librarymlcroissant, Modalityimage, Librarydatasets, Parquet, Regionus

Description

Presenting a reformatted version of theblackcat102/llava-instruct-mix, prepared for Vision Supervised Fine-Tuning (VSFT) with the TRL SFT Trainer. It is designed for instruction tuning of multimodal vision-language models. The dataset's author is HuggingFaceH4, and it was last updated in April 2024.

Use Cases

Fine-tune a vision-language model for instruction following using the reformatted instruction tuning data.
Apply Vision Supervised Fine-Tuning (VSFT) with the TRL SFT Trainer on the multimodal instruction data.
Benchmark instruction-following performance in multimodal models using the provided English instruction tuning format.

Strengths

Dataset is specifically formatted for use with the TRL library's SFT Trainer, a standard tool for fine-tuning.
Last update was in April 2024, indicating recent maintenance.

Limitations

The dataset size, row count, and specific column structure are unknown, making it difficult to assess scale and structure.
The original data source and its collection methodology are not detailed, limiting understanding of data provenance.

Provenance

Source: Reformatted from theblackcat102/llava-instruct-mix dataset.
Collection Method: Reformatted for Vision Supervised Fine-Tuning (VSFT) with TRL's SFT Trainer.
Freshness: Last updated on 2024-04-11.

Users should be familiar with the TRL library and the specific VSFT script (vsft_llava.py) referenced in the description to utilize this dataset effectively. The license and exact file formats are unknown.

Parquet Librarypolars Librarydask Modalitytext Size Categories100 Kn1 M Librarymlcroissant Modalityimage Librarydatasets Regionus

Llava Instruction Tuning Data for Vision Language Models

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info