100 real U.S. federal civil cases from PACER records form this dataset. Each case archive includes original PDF documents and derived JSON artifacts such as opinions, hypothesis trees, and fact DAGs. The dataset was released anonymously for a NeurIPS 2026 Evaluations & Datasets Track submission.
Use Cases
- Benchmarking legal reasoning models based on derived ground-truth artifacts like hypothesis trees.
- Analyzing the structure of civil litigation based on fact DAGs (Directed Acyclic Graphs).
- Training NLP models for legal document parsing based on PACER PDFs and derived JSON opinions.
- Evaluating multi-step inference systems using the case hypothesis trees.
Strengths
- Contains 100 real U.S. federal civil cases.
- Provides derived ground-truth artifacts including opinions, hypothesis trees, and fact DAGs.
- Includes original PACER PDF documents for each case.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- PACER (Public Access to Court Electronic Records)
- Collection Method
- Downloaded from PACER and processed to derive JSON artifacts.
- Freshness
- Last updated 2026-05-07 12:01:57; freshness should be verified.
- Geography
- United States