Name: K-MetBench: A Multi-Dimensional Benchmark for Meteorology Models
Creator: soyeonbot
Published: 2026-04-14T03:10:17
Keywords: Meteorology, Benchmark, Llm Evaluation, Reasoning Benchmark, Fine Grained Evaluation, Multimodal Benchmark, Multimodal

Description

K-MetBench is a multi-dimensional benchmark for evaluating meteorology models across accuracy, reasoning quality, geo-cultural alignment, and fine-grained domain coverage. The dataset was created by soyeonbot and was last updated on Hugging Face in April 2026. Its public evaluation protocol uses an explicit advanced benchmark and an explicit reasoning benchmark followed by LLM-as-a-judge evaluation.

Use Cases

Evaluate model accuracy on fine-grained meteorological concepts based on the described domain coverage.
Assess the quality of expert reasoning in AI models based on the explicit reasoning benchmark.
Test geo-cultural alignment of meteorological predictions based on the described locality evaluation dimension.
Benchmark multimodal AI performance in meteorology based on the described multimodal evaluation dimension.

Strengths

Designed for multi-dimensional evaluation across accuracy, reasoning, locality, and domain coverage.
Includes a public evaluation protocol using LLM-as-a-judge for the explicit benchmark splits.
Last updated on 2026-04-15, suggesting recent maintenance.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The implicit split of the benchmark is not fully described in the provided metadata.

Provenance

Source: Hugging Face user soyeonbot.
Freshness: Last updated 2026-04-15 18:35:45; freshness should be verified.

License is unknown; terms of use must be verified on the dataset page.

Multimodal Meteorology Benchmark Llm Evaluation Reasoning Benchmark Fine Grained Evaluation Multimodal Benchmark

K-MetBench: A Multi-Dimensional Benchmark for Meteorology Models

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info