Sign in to view source links and access this dataset
Description
A curated subset of 5,444 rows from the reglab/legal_hallucinations dataset, containing up to 1,000 randomly sampled rows for each of six specific legal reasoning tasks. The original dataset was created for the paper 'Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models' by Dahl et al., forthcoming in the Journal of Legal Analysis. The dataset is hosted by author nguha on Hugging Face and was last updated on January 5, 2026.
Use Cases
Benchmarking LLM performance on legal reasoning tasks based on the six specific tasks mentioned.
Analyzing patterns of legal hallucinations in model outputs based on the dataset's curated examples.
Training or fine-tuning models to detect or reduce legal hallucinations based on the provided task samples.
Studying the intersection of AI and legal analysis based on the dataset's focus on legal reasoning.
Strengths
Contains 5,444 total rows, providing a substantial sample for analysis.
Includes up to 1,000 rows for each of six distinct legal reasoning tasks, enabling task-specific evaluation.
Derived from a dataset created for a peer-reviewed, forthcoming academic paper, suggesting a research-grade foundation.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count for the full original dataset is unknown, which may limit suitability assessment for larger-scale studies.
The description metadata is limited; actual data quality and task definitions require manual inspection after download.
Provenance
Source
Subset of the reglab/legal_hallucinations dataset, originally created for the paper by Dahl et al. (2024, forthcoming).
Collection Method
Curated subset containing up to 1,000 randomly sampled rows for each of 6 legal reasoning tasks.
Freshness
Last updated 2026-01-05 20:15:13; freshness should be verified.
License is unknown; users should verify terms of use before downloading.