Description

Four CSV files contain outputs and evaluation scores from experiments described in the paper 'MLLM-as-a-Judge for Financial Document Image Machine Translation'. The dataset likely includes translations generated by Gemma models, scores assigned by a judge model, and its reasoning. Yanco Amor and Torterolo Orta published this data via e-cienciaDatos Harvested Dataverse on June 14, 2026.

Use Cases

Benchmarking translation quality based on scores from a multimodal LLM-as-a-judge.
Comparing zero-shot versus fine-tuned model performance for financial document translation.
Analyzing the impact of oracle-guided self-refinement on translation outputs.
Studying the reasoning and justification process of an automated evaluation model.

Strengths

Contains outputs from four distinct model configurations, including fine-tuned and oracle-guided variants.
Includes scores and reasoning from a dedicated judge model (google/gemma-4-26B-A4B-it).
Data originates from a published research paper with a defined experimental methodology.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The dataset scope is limited to financial reports from IBEX 35 companies.

Provenance

Source: e-cienciaDatos Harvested Dataverse
Collection Method: Generated as part of the experiments for the paper 'MLLM-as-a-Judge for Financial Document Image Machine Translation'.
Freshness: Last updated 2026-06-14 04:10:12; freshness should be verified.

Tabular Machine Translation Model Evaluation Multimodal Llm Benchmark Computer Vision Finance Financial Documents Synthetic

Gemma Model Outputs and Scores for Financial Document Translation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info