Name: UMM: Unified Multimodal Model Results for the UEval Benchmark
Creator: wenwenw945
Published: 2026-04-06T01:56:29
Keywords: Model Evaluation, Multimodal Llm, Benchmark, Understanding Tasks, Benchmark Results, Computer Vision, Multimodal

Description

UMM results for the UEval benchmark, a collection of outputs from various multimodal and large language models. The dataset was created by author wenwenw945 and last updated on April 9, 2026. It includes results from models like OmniGen2 and Emu3.5, configured for specific understanding tasks.

Use Cases

Benchmarking model performance on multimodal understanding tasks based on the UEval framework.
Comparing the 'understanding' capabilities of different models like OmniGen2 and Emu3.5.
Analyzing the effect of different decoding strategies (e.g., greedy vs. sampling) on model outputs.
Studying model behavior on tasks requiring image generation alongside text understanding.

Strengths

Includes results from multiple state-of-the-art models, such as OmniGen2 and Emu3.5.
Provides specific configuration details for model runs, such as 'understanding max_new_tokens' and sampling methods.
Last updated date of 2026-04-09 suggests recent benchmarking activity.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count and file formats are unknown, which may limit suitability assessment.

Provenance

Source: huggingface
Collection Method: Results generated by running models on the UEval benchmark.
Freshness: Last updated 2026-04-09 22:35:58; freshness should be verified.

License is unknown; users should verify permissions before use. Some models like BLIP3O and TokenFlow are excluded due to lack of 'understanding' capability.

Multimodal Model Evaluation Multimodal Llm Benchmark Understanding Tasks Benchmark Results Computer Vision

UMM: Unified Multimodal Model Results for the UEval Benchmark

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info