Sign in to view source links and access this dataset
Description
PolyCap is a dataset of image-grounded captions for the MindSemantix project. The dataset, created by author ziqiren, was last updated on HuggingFace on 2026-05-12. It contains caption files for subjects sub01, sub02, sub05, and sub07, with corresponding COCO captions referenced to be obtained from the separate NSD dataset.
Use Cases
Training or evaluating vision-language models on image captioning tasks based on the described caption files.
Research on aligning neural imaging data with descriptive text based on the dataset's grounding in the MindSemantix project.
Comparing different captioning model outputs (e.g., BLIP-2 vs. Shikra) for the same stimuli based on the multiple caption file versions provided.
Strengths
Provides multiple caption versions (BLIP-2, Shikra v1, v2) for subject sub01, allowing for comparative analysis.
Explicitly links to the established COCO dataset and NSD dataset for broader context.
Limitations
Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Provenance
Source
ziqiren on HuggingFace
Collection Method
Likely generated by vision-language models (BLIP-2, Shikra) for images used in a neuroscience context.
Freshness
Last updated 2026-05-12 16:04:48; freshness should be verified.
Corresponding COCO captions are not included and must be obtained separately from the NSD dataset.