4,000 multimodal instruction-tuning samples designed to instill Evidence-of-Thought (EoT) reasoning into Vision-Language Models for remote sensing. The dataset utilizes a Socratic questioning approach to guide models through logical, step-by-step interpretation of satellite and aerial imagery.
Use Cases
- Fine-tune Vision-Language Models (VLMs) to perform logical spatial reasoning using the instruction and EoT response pairs
- Develop interpretable satellite imagery analysis systems by training on intermediate reasoning evidence
- Benchmark the zero-shot reasoning capabilities of remote sensing models using Socratic-style prompts
Strengths
- 4,000 instruction-tuning pairs specifically for remote sensing (RS) tasks
- Includes Evidence-of-Thought (EoT) reasoning chains for complex spatial analysis
- Multimodal format pairing high-resolution remote sensing images with structured text prompts
- Based on the 'Asking like Socrates' methodology for hierarchical visual reasoning