Geometry3K In Context Synthesizing

Name: Geometry3K In Context Synthesizing
Creator: WaltonFuture
Published: 2025-04-28T06:18:06
Keywords: Size Categories1 Kn10 K, Task Categoriesimage Text To Text, Librarypolars, Arxiv250522453, Modalitytext, Librarymlcroissant, Modalityimage, Librarydatasets, Librarypandas, Parquet, Regionus

by WaltonFutureUpdated 1y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

2,101 image-text pairs designed for unsupervised post-training of multi-modal large language models. Each entry includes a 'problem' field with a geometric reasoning question and an 'answer' field containing the corresponding solution.

Use Cases

Train multi-modal LLMs using the 'problem' and 'answer' fields to improve geometric reasoning capabilities.
Implement Group Relative Policy Optimization (GRPO) by utilizing the synthesized reasoning paths.
Fine-tune vision-language models for unsupervised post-training scenarios using the image-text pairs.

Strengths

2,101 examples of geometric reasoning problems paired with textual solutions.
Includes a 'problem' column containing multi-modal questions requiring spatial reasoning.
Features an 'answer' column providing step-by-step solutions for model supervision.
Derived from the Geometry3K benchmark to support the MM-UPT post-training framework.

Parquet Size Categories1 Kn10 K Task Categoriesimage Text To Text Librarypolars Arxiv250522453 Modalitytext Librarymlcroissant Modalityimage Librarydatasets Librarypandas Regionus

Related Datasets

Quality Score

D36

Description

39

Source

36

Reputation

42

Access

22

Community

35 downloads

2 likes

0 views

Dataset Info

Author: WaltonFuture
Created: Apr 28, 2025
Updated: Jun 2, 2025

Access

22

Community

35 downloads

2 likes

0 views

Dataset Info

Author: WaltonFuture
Created: Apr 28, 2025
Updated: Jun 2, 2025

Geometry3K In Context Synthesizing

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info