Name: Llava Instruct 9K: A Multimodal Instruction Dataset for Vision and Voice Tasks
Creator: dreyn74
Published: 2026-06-02T13:09:11
Keywords: Vision Language, Multimodal Llm, Computer Vision, Audio, Instruct Dataset, Synthetic Data, Synthetic, Multimodal

Description

A multimodal dataset derived from the LLaVA-Instruct-150K source, containing synthetic annotations for tasks involving text, images, and speech. It is licensed under CC-BY-4.0 and was uploaded by author dreyn74. The dataset's size is indicated to be between 10,000 and 100,000 samples.

Use Cases

Fine-tuning models for visual question answering based on image-text pairs.
Training text-to-speech or speech-to-speech systems using synthetic speech data.
Developing multimodal assistants capable of image-to-text and image-to-speech generation.
Benchmarking automatic speech recognition models on synthetic instruction data.

Strengths

Covers multiple multimodal task categories, including text-to-text, image-to-text, and speech-related tasks.
Built upon the established LLaVA-Instruct-150K source dataset.
Uses a permissive CC-BY-4.0 license for open use and redistribution.

Limitations

Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Data is synthetically generated, which may not reflect real-world distributions or complexities.

Provenance

Source: liuhaotian/LLaVA-Instruct-150K on Hugging Face
Collection Method: Synthetic annotations derived from a source dataset.
Freshness: Last updated 2026-06-03 01:22:12; freshness should be verified.

The full description and data structure are available only on the Hugging Face dataset page; users must inspect the page for complete details.

Audio Multimodal Vision Language Multimodal Llm Computer Vision Instruct Dataset Synthetic Data Synthetic

Llava Instruct 9K: A Multimodal Instruction Dataset for Vision and Voice Tasks

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info