33,000+ OCR-cleaned judgments from the Indian Supreme Court, spanning the years 1950 to 2023. The dataset is described as ready for Named Entity Recognition tasks. The author, organization, and license details are unknown.
Use Cases
- Named Entity Recognition (NER) based on OCR-cleaned legal text.
- Analyzing trends in Indian Supreme Court jurisprudence based on judgment text.
- Training language models on domain-specific legal vocabulary.
- Extracting legal citations and case references from judgment text.
- Studying the evolution of legal language and concepts over a 73-year period.
Strengths
- 33,000+ judgments provide a substantial corpus for analysis.
- 73-year time range (1950–2023) offers long-term historical coverage.
- OCR-cleaned text suggests preprocessing for readability and analysis.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Time Range
- 1950–2023
- Geography
- India