YuITC's Vietnamese Legal Documents Dataset provides a benchmark corpus for legal information retrieval. The dataset includes a collection of legal documents and train/test splits with natural language queries paired with relevant documents. It was last updated on March 18, 2026.
Use Cases
- Benchmarking legal information retrieval models based on the provided corpus and query-document pairs.
- Training natural language processing models for Vietnamese legal text understanding.
- Evaluating the performance of search algorithms on domain-specific legal queries.
Strengths
- Dataset is explicitly designed as a benchmark for legal information retrieval.
- Includes structured train/test splits with natural language queries and corresponding relevant documents.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
Provenance
- Source
- YuITC, based on raw data from tmnam20/BKAI-Legal-Retrieval.
- Freshness
- Last updated 2026-03-18 12:29:31; freshness should be verified.