Sign in to view source links and access this dataset
Description
Italy's legal texts aggregated from four open-data sources for NLP applications. The corpus includes approximately 300,000 national legislative documents from 1861 to 2026, 18,000 Constitutional Court decisions, 100,000 administrative justice metadata records, and 50,000 EU legal texts in Italian. It was created by dossier-legal and last updated in March 2026.
Use Cases
Training legal language models based on the corpus of Italian national legislation.
Evaluating named entity recognition models on Constitutional Court decision texts.
Developing information retrieval systems for EU legislation in Italian.
Analyzing patterns in administrative justice metadata for legal research.
Strengths
Aggregates data from four distinct legal sources, providing breadth.
Includes a large volume of national legislation (~300K documents).
Covers a long temporal range for national law, from 1861 to 2026.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count for the full corpus is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
Normattiva, Corte Costituzionale, OpenGA, EUR-Lex
Collection Method
Aggregation from four open-data sources.
Time Range
1861-2026
Freshness
Last updated 2026-03-01 03:13:17; freshness should be verified.
Geography
Italy, European Union
License is unknown; restrictions should be verified before use.