Dense English captions for the CommonCatalog CC-BY image collection generated via the Phi-3 Vision model. The data is structured in a CSV format where each entry is linked to the original image repository through a unique photoid primary key.
Use Cases
- Fine-tune text-to-image generative models using the dense captions and images linked via the photoid column.
- Develop image retrieval systems by indexing the Phi-3 Vision generated text associated with each photoid.
- Train captioning models by using the dense captions as target labels for images identified by photoid.
Strengths
- Includes dense English captions generated by the Phi-3 Vision model.
- Uses photoid as the primary key for relational mapping to the CommonCatalog CC-BY dataset.
- Provided in a CSV format (commoncatalog-cc-by-phi3.csv) for easy integration with pandas.
- Supports streaming=True loading to maintain sequence alignment with the source image dataset.