Sign in to view source links and access this dataset
Description
Vietnamese Legal Instruction Dataset is an instruction-following dataset built from 127,271 unique Vietnamese legal documents sourced from vbpl.vn, the Government Legal Document Portal of the Ministry of Justice. It contains 341,398 training pairs across 9 question-answering types, with each document paired with its full text for content recall. The dataset was created by author 'duyet' and was last updated on 2026-04-10.
Use Cases
Fine-tuning instruction-following models based on the 9 QA types mentioned in the description.
Training models for legal document content recall based on the full_text pairs.
Benchmarking NLP models on Vietnamese legal text understanding.
Studying the structure and hierarchy of Vietnamese legal documents.
Strengths
Contains 341,398 training pairs, providing substantial volume for model training.
Built from 127,271 unique source documents from an official government portal (vbpl.vn).
Includes full text pairs for each document, enabling content recall tasks.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
License is unknown, which may restrict usage.
Provenance
Source
vbpl.vn (Government Legal Document Portal, Ministry of Justice)
Collection Method
Built from the th1nhng0/vietnamese-legal-documents collection.
Time Range
null
Freshness
Last updated 2026-04-10 04:57:48; freshness should be verified.
Geography
Vietnam
License restrictions are unknown and must be verified before use.