Brokenarxiv Training Outputs Disprove: Qwen3.6-35B Responses to Perturbed ArXiv Statements

Name: Brokenarxiv Training Outputs Disprove: Qwen3.6-35B Responses to Perturbed ArXiv Statements
Creator: MathArena
Published: 2026-06-13T10:55:04
Keywords: Statement Verification, Mathematical Text, Model Evaluation, Text, Llm Training, Arxiv Derived, Synthetic

by MathArenaUpdated 13d ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Training data generated from past ArXiv articles includes outputs from the Qwen3.6-35B model. The dataset contains the model's answers on whether perturbed mathematical statements are correct, with the expected answer always being disprove. It was created by MathArena and last updated on June 16, 2026.

Use Cases

Evaluating LLM robustness against perturbed mathematical statements based on the described verification task.
Training models for mathematical fact-checking using the generated Qwen3.6-35B outputs.
Analyzing failure modes of language models on logical negation or perturbation of scientific text.

Strengths

Data is derived from the established scientific preprint repository ArXiv.
Model outputs are generated by a specific, named model (Qwen3.6-35B).
The verification task has a clear, binary expected outcome (disprove).

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: MathArena (https://matharena.ai/)
Collection Method: Generated from past ArXiv articles, with outputs produced by the Qwen3.6-35B model.
Freshness: Last updated 2026-06-16 14:17:34; freshness should be verified.

License is unknown; users must verify terms before use.

Text Statement Verification Mathematical Text Model Evaluation Llm Training Arxiv Derived Synthetic

Related Datasets

Quality Score

D37

Description

42

Source

36

Reputation

39

Access

26

Community

14 downloads

1 likes

0 views

Dataset Info

Author: MathArena
Created: Jun 13, 2026
Updated: Jun 16, 2026
Last synced: Jun 23, 2026

Access

26

Community

14 downloads

1 likes

0 views

Dataset Info

Author: MathArena
Created: Jun 13, 2026
Updated: Jun 16, 2026
Last synced: Jun 23, 2026

Brokenarxiv Training Outputs Disprove: Qwen3.6-35B Responses to Perturbed ArXiv Statements

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info