Sign in to view source links and access this dataset
Description
Vietnamese Legal Corpus (UTS_VLC) is a dataset of Vietnamese legal documents maintained by Underthesea NLP. It contains the Constitution, Codes, and Laws from 1945 to 2025, with splits from 2021, 2023, and 2026 containing 110, 208, and 318 documents respectively. The dataset was last updated on 2026-01-24.
Use Cases
Train Vietnamese legal language models based on the corpus of laws and codes.
Perform legal information retrieval based on the structured collection of documents.
Analyze temporal changes in Vietnamese legislation based on the 1945-2025 date range.
Develop named entity recognition for legal entities and citations within Vietnamese text.
Strengths
Explicit temporal coverage from 1945 to 2025.
Document counts are provided for splits: 110 (2021), 208 (2023), 318 (2026).
Maintained by a known NLP organization (Underthesea NLP).
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect temporal bias inherent to the source collection period.
Provenance
Source
Underthesea NLP
Collection Method
Likely collected and processed from official Vietnamese legal sources.
Time Range
1945 to 2025
Freshness
Last updated 2026-01-24 07:16:00.
Geography
Vietnam
License is unknown; terms of use must be verified before download.