Name: UniWorld V1: 10,000 Geneval-Style Image-Text Pairs for Semantic Encoding
Creator: LanguageBind
Published: 2025-05-21T06:09:37
Keywords: Size Categories1 Kn10 K, Librarywebdataset, Modalitytext, Librarymlcroissant, Modalityimage, WEBDATASET, Librarydatasets, Regionus, Licensemit, Arxiv250603147

Description

UniWorld V1 provides between 1,000 and 10,000 image-text pairs sourced from the BLIP3o-60k collection, released by LanguageBind in June 2025. It utilizes Geneval-style annotations to facilitate the training of high-resolution semantic encoders for unified visual understanding and generation.

Use Cases

Training high-resolution semantic encoders using the annotation JSON files to link visual features with text
Benchmarking visual understanding models by mapping image root paths to Geneval-style prompts
Developing unified generation models that utilize the specific image-text alignments provided in the source images

Strengths

MIT license allows for broad research and commercial application
Sourced from the established BLIP3o-60k pipeline for high-quality image-text alignment
Specifically formatted for Geneval-style evaluation and training

Limitations

Small sample size of 1,000 to 10,000 records compared to the original 60k source
Requires manual preparation of a data.txt file to map image root paths to annotations

Provenance

Source: LanguageBind, based on the paper 'UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation'
Collection Method: Sourced from BLIP3o-60k and reformatted into Geneval-style annotations
Freshness: Last updated June 2025; reflects current state-of-the-art research in high-resolution semantic encoding.

Users must download source images and annotation JSONs separately from the LanguageBind/UniWorld-V1 repository and construct a data.txt file following the specific format required by the authors.

WEBDATASET Size Categories1 Kn10 K Librarywebdataset Modalitytext Librarymlcroissant Modalityimage Librarydatasets Regionus Licensemit Arxiv250603147

UniWorld V1: 10,000 Geneval-Style Image-Text Pairs for Semantic Encoding

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info