DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

PVIT-3M: Personalized Visual Instruction Tuning Dataset | DataSalon

Home Multimodal & LLMPVIT-3M: Personalized Visual Instruction Tuning Dataset

Multimodal & LLM

PVIT-3M: Personalized Visual Instruction Tuning Dataset

Name: PVIT-3M: Personalized Visual Instruction Tuning Dataset
Creator: Sterzhang
Published: 2024-10-07T09:28:17
Keywords: Image Text Pairs, Personalization, Multimodal Llm, Computer Vision, Large Scale, Visual Instruction Tuning, Multimodal

by Sterzhang·Updated 1y ago

Available on 1 platform

Description

PVIT-3M is a dataset of 3 million image-text pairs designed for tuning Multimodal Large Language Models (MLLMs) on personalized visual instruction tasks. It was created by Sterzhang and introduced in the paper "Personalized Visual Instruction Tuning". The dataset was last updated on November 2, 2024.

Use Cases

Fine-tuning MLLMs to generate responses based on personalized visual inputs.
Improving model adaptability to individual user needs and preferences using image-text pairs.
Benchmarking MLLM performance on tasks requiring tailored visual instruction understanding.

Strengths

Contains 3 million image-text pairs, providing a substantial scale for model training.
Specifically designed for the novel task of personalized visual instruction tuning.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: Sterzhang via Hugging Face.
Collection Method: Introduced in the academic paper "Personalized Visual Instruction Tuning"; specific collection method is not detailed in the provided input.
Freshness: Last updated 2024-11-02 07:41:57; freshness should be verified.

License is unknown; users should verify permissions before use.

Multimodal Image Text Pairs Personalization Multimodal Llm Computer Vision Large Scale Visual Instruction Tuning

Related Datasets

Quality Score

D40

Description

Source

Reputation

Quality Score

D40

Description

Source

Reputation

Access

Community

3.0K downloads

19 likes

0 views

Dataset Info

Author: Sterzhang
Created: Oct 7, 2024
Updated: Nov 2, 2024
Last synced: Jul 3, 2026

Access

Community

3.0K downloads

19 likes

0 views

Dataset Info

Author: Sterzhang
Created: Oct 7, 2024
Updated: Nov 2, 2024
Last synced: Jul 3, 2026

PVIT-3M: Personalized Visual Instruction Tuning Dataset

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info