Name: Argus: Hallucination and Omission Scores for Video-Language Models
Creator: tomg-group-umd
Published: 2025-06-09T00:30:28
Keywords: Omission Evaluation, Hallucination Evaluation, Benchmark, Video Llm, Multimodal Evaluation, Multimodal

Description

ARGUS is a framework for calculating hallucination and omission costs in free-form video captions. The dataset, created by tomg-group-umd, provides metrics to quantify the degree of hallucinated and omitted content in video-language model outputs. It was last updated on June 10,我们发现了一个问题，您提供的原始描述中包含了中文文本。根据指令，我需要将输入翻译成英文。以下是翻译后的描述，并基于此生成输出。

Use Cases

Benchmarking video-language model performance based on hallucination and omission cost metrics.
Training or fine-tuning models to reduce hallucinated content in video descriptions.
Training or fine-tuning models to reduce omitted content in video descriptions.
Analyzing the trade-offs between detail and accuracy in generated video captions.
Comparing different Video-LLM architectures based on their ArgusCost-H and ArgusCost-O scores.

Strengths

Framework provides two specific, defined metrics: ArgusCost-H for hallucination and ArgusCost-O for omission.
Dataset is associated with a published paper and a dedicated website, suggesting academic rigor.
Last updated on 2025-06-10, indicating recent maintenance.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count and dataset scale are unknown, which may limit suitability assessment.
The specific video content and caption data used for evaluation are not described.

Provenance

Source: tomg-group-umd on Hugging Face.
Collection Method: Likely contains scores calculated by the ARGUS framework on video-caption pairs.
Time Range: null
Freshness: Last updated 2025-06-10 02:30:08; freshness should be verified.
Geography: null

null

Multimodal Omission Evaluation Hallucination Evaluation Benchmark Video Llm Multimodal Evaluation

Argus: Hallucination and Omission Scores for Video-Language Models

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info