PLM-Image Auto: Synthetic Image Captions and Question-Answer Pairs

Name: PLM-Image Auto: Synthetic Image Captions and Question-Answer Pairs
Creator: facebook
Published: 2025-03-28T23:10:01
Keywords: Image Captions, Multimodal Qa, Computer Vision, Llm Training, Synthetic Annotations, Multimodal

by facebookUpdated 1y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Synthetic annotations for images and documents created by Facebook for the PLM model. The dataset includes generated captions for images from SA1B, OpenImages, and Object365, and question-answer pairs for documents from ArXivQA, UCSF, and PDFAcc. The dataset was last updated on April 21, 2025.

Use Cases

Training vision-language models based on synthetic image captions mentioned in the description
Fine-tuning models for visual question answering based on the synthetic QA pairs
Benchmarking model performance on generated annotations for datasets like SA1B and OpenImages
Augmenting training data for document understanding tasks using the ArXivQA and PDFAcc annotations

Strengths

Covers multiple established datasets including SA1B, OpenImages, and Object365 for image annotations
Includes annotations for document datasets like ArXivQA and PDFAcc, suggesting a multimodal scope
Last updated on 2025-04-21, indicating recent maintenance

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
The synthetic nature of the data may introduce biases or artifacts not present in human-annotated sources

Provenance

Source: facebook
Collection Method: Synthetic generation for the PLM model, as referenced in the associated paper.
Time Range: null
Freshness: Last updated 2025-04-21 18:03:31
Geography: null

License is unknown; terms of use must be verified before application.

Multimodal Image Captions Multimodal Qa Computer Vision Llm Training Synthetic Annotations

Related Datasets

Quality Score

D40

Description

42

Source

36

Reputation

51

Access

26

Community

430 downloads

18 likes

0 views

Dataset Info

Author: facebook
Created: Mar 28, 2025
Updated: Apr 21, 2025
Last synced: Jun 4, 2026

Access

26

Community

430 downloads

18 likes

0 views

Dataset Info

Author: facebook
Created: Mar 28, 2025
Updated: Apr 21, 2025
Last synced: Jun 4, 2026

PLM-Image Auto: Synthetic Image Captions and Question-Answer Pairs

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info