This dataset contains 120 PDF pages from the U.S. Department of War's May 8, 2024 declassified UFO/UAP report. Each page is rendered as a 200 DPI JPEG image, paired with metadata extracted from the official release manifest.
Use Cases
- Training OCR models on scanned government documents
- Analyzing document layout patterns across a multi-page report
- Extracting structured metadata from PDF headers/footers
Strengths
- 120 high-resolution page images at 200 DPI
- Paired metadata from the official manifest
- Clean extraction from the original PDFs
Limitations
- Limited to 120 pages from one report
- No ground-truth transcriptions provided
- Images are grayscale scans; color information is absent