Zalo AI Legal Text Retrieval VN: Vietnamese Legal Documents for Embedding Benchmark

Name: Zalo AI Legal Text Retrieval VN: Vietnamese Legal Documents for Embedding Benchmark
Creator: GreenNode
Published: 2024-07-24T09:34:54
Keywords: Text Retrieval, Vietnamese, Benchmark, Legal Text, Text, Large Scale, Embedding Evaluation

by GreenNodeUpdated 5mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Vietnamese legal text documents compiled for the Massive Text Embedding Benchmark (MTEB). The dataset, authored by GreenNode, is hosted on Hugging Face and was last updated on 2026-01-08. It is intended for evaluating text embedding models on a legal text-to-text (t2t) retrieval task.

Use Cases

Benchmarking text embedding models for retrieval performance based on the MTEB framework mentioned in the description
Training or fine-tuning models for legal document retrieval based on the described legal text domain
Evaluating cross-lingual or domain-specific embedding capabilities based on the Vietnamese legal text content

Strengths

Designed for a standardized benchmark (MTEB), which suggests a structured evaluation setup
Focuses on a specific, high-value domain (legal text) and language (Vietnamese)

Limitations

Description metadata is limited; actual data quality requires manual inspection after download
Column-level documentation is absent; field semantics must be inferred after download
Row count and file formats are unknown, which may limit suitability assessment

Provenance

Source: GreenNode via Hugging Face, referencing Zalo AI Challenge.
Collection Method: Likely compiled from legal sources for a benchmark challenge.
Time Range: null
Freshness: Last updated 2026-01-08 08:05:48; freshness should be verified
Geography: Vietnam (inferred from 'VN' in title and Vietnamese language focus)

null

Text Text Retrieval Vietnamese Benchmark Legal Text Large Scale Embedding Evaluation

Related Datasets

Quality Score

D39

Description

42

Source

36

Reputation

45

Access

26

Community

141 downloads

3 likes

0 views

Dataset Info

Author: GreenNode
Created: Jul 24, 2024
Updated: Jan 8, 2026
Last synced: Jun 16, 2026

Access

26

Community

141 downloads

3 likes

0 views

Dataset Info

Author: GreenNode
Created: Jul 24, 2024
Updated: Jan 8, 2026
Last synced: Jun 16, 2026

Zalo AI Legal Text Retrieval VN: Vietnamese Legal Documents for Embedding Benchmark

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info