Vietnamese legal text documents compiled for the Massive Text Embedding Benchmark (MTEB). The dataset, authored by GreenNode, is hosted on Hugging Face and was last updated on 2026-01-08. It is intended for evaluating text embedding models on a legal text-to-text (t2t) retrieval task.
Use Cases
- Benchmarking text embedding models for retrieval performance based on the MTEB framework mentioned in the description
- Training or fine-tuning models for legal document retrieval based on the described legal text domain
- Evaluating cross-lingual or domain-specific embedding capabilities based on the Vietnamese legal text content
Strengths
- Designed for a standardized benchmark (MTEB), which suggests a structured evaluation setup
- Focuses on a specific, high-value domain (legal text) and language (Vietnamese)
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download
- Column-level documentation is absent; field semantics must be inferred after download
- Row count and file formats are unknown, which may limit suitability assessment
Provenance
- Source
- GreenNode via Hugging Face, referencing Zalo AI Challenge.
- Collection Method
- Likely compiled from legal sources for a benchmark challenge.
- Time Range
- null
- Freshness
- Last updated 2026-01-08 08:05:48; freshness should be verified
- Geography
- Vietnam (inferred from 'VN' in title and Vietnamese language focus)