13,003 images of 11,003 identities accompanied by 80,440 natural language descriptions. The dataset facilitates cross-modal person search by linking visual pedestrian data from surveillance cameras with detailed textual attributes.
Use Cases
- Train cross-modal retrieval models to match textual descriptions to pedestrian images using the identity labels
- Develop natural language processing models for fine-grained attribute extraction from the text descriptions
- Benchmark person re-identification systems using natural language queries instead of image crops
Strengths
- 80,440 natural language descriptions paired with 13,003 pedestrian images
- 11,003 unique person identities across multiple camera views
- Each image is annotated with at least two distinct textual descriptions