Brazilian Portuguese legal documents from the Federal Court of Accounts (TCU) form a dataset for Legal Information Retrieval (LIR). It contains 14,469 normative acts, 46 queries, and 812 relevance judgments derived from 3,048 human annotations. The dataset was created by LeandroRibeiro and last updated on Hugging Face in March 2026.
Use Cases
- Benchmarking legal information retrieval models based on the 46 queries and graded relevance judgments.
- Training Portuguese-language NLP models for legal text understanding based on the 14,469 normative documents.
- Studying the structure and language of Brazilian administrative law based on the collection of normative acts.
Strengths
- Contains 14,469 legal documents, providing a substantial text corpus.
- Includes 812 human-annotated relevance judgments derived from 3,048 individual annotations, offering a benchmark for evaluation.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count for the main document table is unknown, which may limit suitability assessment.
Provenance
- Source
- Normative documents from the Brazilian Federal Court of Accounts (Tribunal de Contas da União - TCU).
- Collection Method
- Human-annotated relevance judgments for query-document pairs.
- Time Range
- null
- Freshness
- Last updated 2026-03-27 22:19:03; freshness should be verified.
- Geography
- Brazil