GuideDog is a real-world egocentric multimodal dataset for accessibility-aware guidance for blind and low-vision users. It contains 22,084 image-description pairs, including 2,106 human-verified gold and 19,978 VLM-generated silver annotations, collected from real walking videos across diverse cities. The dataset accompanies an ACL 2026 paper and includes derived multiple-choice subsets.
Use Cases
- Train vision-language models for scene description based on egocentric imagery.
- Develop navigation aids for blind and low-vision users based on real-world walking video data.
- Benchmark AI models on accessibility-aware guidance tasks using the human-verified gold subset.
- Generate synthetic training data for assistive technologies using the VLM-generated silver annotations.
- Create multiple-choice question answering systems for spatial reasoning based on the derived subsets.
Strengths
- Contains 22,084 image-description pairs, providing a substantial multimodal corpus.
- Includes 2,106 human-verified gold-standard annotations, ensuring a high-quality validation subset.
- Data is collected from real walking videos across diverse cities, offering real-world ecological validity.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- The dataset page indicates a last update of 2026-04-28; freshness should be verified.
- The description does not specify the exact geographic or temporal coverage of the walking videos.
Provenance
- Source
- huggingface user kjunh
- Collection Method
- Collected from real walking videos across diverse cities; annotations include human-verified and VLM-generated pairs.
- Freshness
- Last updated 2026-04-28 16:19:06.
- Geography
- Diverse cities (specific locations not stated).