MSE-Bench consists of 100 test instances designed to evaluate multi-turn image editing systems under realistic workflows. It was created by leigangqu and hosted on Hugging Face, with a last recorded update on 2026-03-19. The benchmark provides a source image and a series of editing instructions for models to apply cumulatively.
Use Cases
- Benchmarking model performance on cumulative image edits based on sequential instructions.
- Evaluating the realism of multi-turn editing workflows described in the dataset.
- Testing AI systems' ability to follow a series of text-based editing commands on an image.
Strengths
- The benchmark is explicitly designed for realistic editing workflows, a specific evaluation goal.
- It contains 100 test instances, providing a defined scale for evaluation.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Last updated 2026-03-19 03:05:11; freshness should be verified.
Provenance
- Source
- huggingface
- Freshness
- 2026-03-19 03:05:11