Name: KORE-74K Image Recognition and Captioning Dataset
Creator: kailinjiang
Published: 2025-08-23T14:38:11
Keywords: Task Categoriesimage Text To Text, Multimodal Ai, Arxiv251019316, Computer Vision, Image Captioning, Regionus, Recognition Data, Visual Question Answering, Multimodal

Description

KORE-74K is a multimodal dataset containing over 74,000 training entries for image recognition, captioning, and visual question answering tasks. It was created by author kailinjiang and published in 2026, building upon the MMEVOKE dataset. The data includes separate archives for recognition/caption images and VQA images, paired with structured JSON annotations.

Use Cases

Train image captioning models using the `imgs_of_recognition_caption_description.zip` images and corresponding text annotations in the JSON files.
Develop visual question answering models using the images in the `imgs_of_vqa` directory and their associated question-answer pairs from the JSON data.
Fine-tune vision-language models for recognition tasks by leveraging the labeled image data and structured `KORE-74K-training_data.json` file.
Conduct research on multimodal AI by combining the image data from KORE-74K with the related MMEVOKE dataset as referenced by the author.

Strengths

Contains over 74,000 training data entries for model development.
Provides structured JSON annotations for multiple vision-language tasks.
Dataset was updated in February 2026, indicating recent maintenance.

Limitations

The exact number of images, rows, and dataset size are unspecified.
Geographic and demographic bias is unknown as the image sources are not detailed.
The license terms for use and redistribution are not provided.

Provenance

Source: Hugging Face dataset created by kailinjiang, building upon the MMEVOKE dataset.
Collection Method: Data compilation method is not specified in the provided description.
Freshness: Last updated on 2026-02-05.
Geography: Region 'us' is indicated in platform tags, but specific spatial coverage is unknown.

The image data is split into multiple zip parts that must be concatenated using the command `cat split_zip_part_* > combined.zip` before use. The dataset is intended to be combined with MMEVOKE's image data for complete training.

Multimodal Task Categoriesimage Text To Text Multimodal Ai Arxiv251019316 Computer Vision Image Captioning Regionus Recognition Data Visual Question Answering

KORE-74K Image Recognition and Captioning Dataset

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info