Tencent's benchmark evaluates LLM performance on complex translation instructions. It covers 6 constraint types across multiple languages, including single-constraint and multi-constraint scenarios. The dataset was last updated on 2026-05-20.
Use Cases
- Benchmarking LLM translation accuracy based on glossary compliance constraints
- Evaluating style adherence in machine translation based on style following constraints
- Testing model performance on multi-constraint translation scenarios
Strengths
- Covers 6 distinct constraint types for evaluation
- Includes both single-constraint and multi-constraint scenarios
- Uses multiple evaluation methods including rule checks and LLM judges
Limitations
- Column-level documentation is absent; field semantics must be inferred after download
- Row count is unknown, which may limit suitability assessment
Provenance
- Source
- Tencent
- Freshness
- Last updated 2026-05-20 19:17:48; freshness should be verified