A dataset likely related to the No Language Left Behind (NLLB) project and the COMET metric for evaluating machine translation quality. The dataset is published on Kaggle, but its specific size, contents, and creation details are not provided in the metadata. Further verification is required to confirm the exact data types, volume, and temporal coverage.
Use Cases
- Benchmarking machine translation systems using the COMET metric (inferred from domain, verify after download)
- Training or fine-tuning quality estimation models for translation output (inferred from domain, verify after download)
- Analyzing translation quality across different language pairs (inferred from domain, verify after download)
Strengths
- Published on Kaggle, a major platform for data science.
Limitations
- Metadata is minimal; actual content requires verification after download.
- Row count, file formats, and column definitions are unknown.
- License, author, and last update information are unavailable.
Provenance
- Source
- Kaggle
- Collection Method
- Likely derived from the NLLB project or related research, but the specific gathering method is unknown.
- Time Range
- null
- Freshness
- Last updated date is unknown; freshness unverified.
- Geography
- null