251,038 legal case records extracted and processed from the Open Legal Data dump as of 2022-10-18. This dataset is an independent, cleaned derivative of that source data, provided in JSONL format by author harshildarji. The dataset page was last updated on 2026-04-18.
Use Cases
- Legal document classification based on case record content.
- Named entity recognition for extracting parties, judges, and locations from case text.
- Temporal analysis of legal trends based on the 2022 data snapshot.
- Building search or recommendation systems for legal case databases.
Strengths
- 251,038 records provide a substantial corpus for analysis.
- Data is explicitly described as a cleaned derivative of the source dump.
- Records are in a structured JSONL format suitable for programmatic use.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is known, but specific data fields and file sizes are unknown.
- The underlying source data is from a specific snapshot in 2022, which may limit temporal coverage.
Provenance
- Source
- Open Legal Data dump.
- Collection Method
- Extracted and processed to create a cleaned derivative.
- Time Range
- Source data snapshot as of 2022-10-18.
- Freshness
- Dataset page last updated 2026-04-18; data source snapshot is from 2022-10-18.
- Geography
- null