Four CSV files contain outputs and evaluation scores from experiments described in the paper 'MLLM-as-a-Judge for Financial Document Image Machine Translation'. The dataset likely includes translations generated by Gemma models, scores assigned by a judge model, and its reasoning. Yanco Amor and Torterolo Orta published this data via e-cienciaDatos Harvested Dataverse on June 14, 2026.
Use Cases
- Benchmarking translation quality based on scores from a multimodal LLM-as-a-judge.
- Comparing zero-shot versus fine-tuned model performance for financial document translation.
- Analyzing the impact of oracle-guided self-refinement on translation outputs.
- Studying the reasoning and justification process of an automated evaluation model.
Strengths
- Contains outputs from four distinct model configurations, including fine-tuned and oracle-guided variants.
- Includes scores and reasoning from a dedicated judge model (google/gemma-4-26B-A4B-it).
- Data originates from a published research paper with a defined experimental methodology.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- The dataset scope is limited to financial reports from IBEX 35 companies.
Provenance
- Source
- e-cienciaDatos Harvested Dataverse
- Collection Method
- Generated as part of the experiments for the paper 'MLLM-as-a-Judge for Financial Document Image Machine Translation'.
- Freshness
- Last updated 2026-06-14 04:10:12; freshness should be verified.