Sign in to view source links and access this dataset
Description
RepoJus is a consolidated corpus of Brazilian legal data containing 3,141,319 full-text court judgments (acórdãos, monocratic decisions, and súmulas) from nine nationally relevant Brazilian tribunals. The records were collected from official public sources, standardized into a unified JSON format, and processed through a deduplication and cleaning pipeline. The dataset was authored by andrebadini and last updated on Hugging Face on June 8, 2026.
Use Cases
Train language models for legal text classification based on the described document types (acórdãos, decisões monocráticas, súmulas).
Analyze legal reasoning patterns across different Brazilian tribunals.
Develop information extraction systems for entities and concepts in Portuguese-language jurisprudence.
Benchmark legal document deduplication and standardization techniques.
Study the evolution of legal arguments and precedent within the Brazilian judiciary.
Strengths
Contains 3,141,319 full-text legal documents.
Sourced from nine nationally relevant Brazilian tribunals.
Data has been standardized into a unified JSON format and processed through a deduplication pipeline.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Freshness should be verified as the last update is dated 2026-06-08.
Provenance
Source
Official public sources from nine Brazilian tribunals.
Collection Method
Collected from public sources, standardized, deduplicated, and cleaned via advanced heuristics.
Freshness
Last updated 2026-06-08 16:15:07; freshness should be verified.
Geography
Brazil
License is unknown; terms of use should be verified before application.