Commoncatalog Cc By Recap

Name: Commoncatalog Cc By Recap
Creator: alfredplpl
Published: 2024-05-25T23:43:31
Keywords: Librarypolars, Task Categoriesimage To Text, Languageen, Task Categoriestext To Image, Modalitytext, Size Categories100 Kn1 M, CSV, Librarymlcroissant, Librarydatasets, Librarypandas, Licensecc By 40, Regionus

by alfredplplUpdated 2y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Dense English captions for the CommonCatalog CC-BY image collection generated via the Phi-3 Vision model. The data is structured in a CSV format where each entry is linked to the original image repository through a unique photoid primary key.

Use Cases

Fine-tune text-to-image generative models using the dense captions and images linked via the photoid column.
Develop image retrieval systems by indexing the Phi-3 Vision generated text associated with each photoid.
Train captioning models by using the dense captions as target labels for images identified by photoid.

Strengths

Includes dense English captions generated by the Phi-3 Vision model.
Uses photoid as the primary key for relational mapping to the CommonCatalog CC-BY dataset.
Provided in a CSV format (commoncatalog-cc-by-phi3.csv) for easy integration with pandas.
Supports streaming=True loading to maintain sequence alignment with the source image dataset.

CSV Librarypolars Task Categoriesimage To Text Languageen Task Categoriestext To Image Modalitytext Size Categories100 Kn1 M Librarymlcroissant Librarydatasets Librarypandas Licensecc By 40 Regionus

Related Datasets

Quality Score

D34

Description

39

Source

36

Reputation

29

Access

22

Community

14 downloads

3 likes

0 views

Dataset Info

Author: alfredplpl
Created: May 25, 2024
Updated: Jun 27, 2024

Access

22

Community

14 downloads

3 likes

0 views

Dataset Info

Author: alfredplpl
Created: May 25, 2024
Updated: Jun 27, 2024

Commoncatalog Cc By Recap

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info