Sign in to view source links and access this dataset
Description
UFOCR is an open dataset of declassified records from the FBI and U.S. Department of War on UFOs, UAPs, and extraterrestrial investigations. The original government archives, which include scanned typewriter pages and handwritten notes, have been parsed into structured text using Reducto. The dataset was last updated on Hugging Face by the user 'reducto' on May 11, 2026.
Use Cases
Train language models on historical government text based on declassified records.
Analyze document structure and information extraction from poor-quality scans mentioned in the description.
Study trends and terminology in official U.S. government UFO/UAP investigations.
Strengths
Source documents are from authoritative U.S. government agencies, the FBI and Department of War.
The description notes the data has been processed to address specific OCR challenges like faded copies and handwritten notes.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Freshness should be verified as the last update timestamp is in the future (2026-05-11).
Provenance
Source
FBI and U.S. Department of War declassified archives.
Collection Method
Parsed from original scanned archives using Reducto.
Freshness
Last updated 2026-05-11 21:52:49
License is unknown; users should verify terms of use before downloading.