Ko Vdr Hn: Korean Visual Document Retrieval Hard Negatives for Embedding Models

Name: Ko Vdr Hn: Korean Visual Document Retrieval Hard Negatives for Embedding Models
Creator: whybe-choi
Published: 2026-04-25T09:39:41
Keywords: Korean Language, Document Images, Embedding Training, Computer Vision, Multimodal Retrieval, Multimodal

by whybe-choiUpdated 2mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Korean Visual Document Retrieval Hard Negatives is a multimodal training set for fine-tuning embedding models. The dataset, created by whybe-choi, was last updated on 2026-04-25. Each row contains a text query, a page image document, one positive match, and seven mined hard negatives.

Use Cases

Fine-tune visual-document retrieval models based on Korean text queries and page images.
Improve model ranking performance using the provided hard negative examples.
Benchmark cross-modal retrieval systems for Korean document pages.

Strengths

Includes seven hard negative examples per query, which are useful for training robust retrieval models.
Hard negatives were mined using the Qwen/Qwen3-VL-Embedding-8B model, suggesting a targeted mining approach.
Positives sharing the same query within the same source dataset were excluded from the negative pool, potentially improving negative quality.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.

Provenance

Source: huggingface
Collection Method: Hard negatives were mined with Qwen/Qwen3-VL-Embedding-8B within each source dataset.
Freshness: Last updated 2026-04-25 13:31:56; freshness should be verified.

Multimodal Korean Language Document Images Embedding Training Computer Vision Multimodal Retrieval

Related Datasets

Quality Score

D37

Description

42

Source

36

Reputation

41

Access

22

Community

40 downloads

1 likes

0 views

Dataset Info

Author: whybe-choi
Created: Apr 25, 2026
Updated: Apr 25, 2026
Last synced: Jun 18, 2026

Access

22

Community

40 downloads

1 likes

0 views

Dataset Info

Author: whybe-choi
Created: Apr 25, 2026
Updated: Apr 25, 2026
Last synced: Jun 18, 2026

Ko Vdr Hn: Korean Visual Document Retrieval Hard Negatives for Embedding Models

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info