Name: Stage2 Qwen3 4B Sft Proofbench Summary Graded: AI Model Safety Evaluations
Creator: violetxi
Published: 2026-04-01T02:29:43
Keywords: Size Categories1 Kn10 K, Librarypolars, Ai Safety, Modalitytext, Librarymlcroissant, Librarydatasets, Proof Bench, Librarypandas, Llm Evaluation, Text, Text, Graded Responses, Regionus, JSON

Description

A dataset of graded summaries from the Proofbench evaluation for the Qwen3 4B model, published by author violetxi on Hugging Face. The dataset appears to contain text outputs from a fine-tuned language model assessed for reasoning or safety. The platform tags indicate the data is in JSON format and primarily textual.

Use Cases

Benchmarking the safety or reasoning performance of language models (inferred from domain, verify after download)
Training or fine-tuning models for improved alignment or factual consistency (inferred from domain, verify after download)
Analyzing failure modes in model-generated summaries or proofs (inferred from domain, verify after download)

Strengths

Published on the Hugging Face platform, a major repository for machine learning datasets.
The dataset is associated with a specific model version (Qwen3 4B) and evaluation framework (Proofbench).
Last updated on 2026-04-01 02:29:50.

Limitations

Metadata is minimal; actual content requires verification after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count, file formats, and license are unknown, which may limit suitability assessment.

Provenance

Source: huggingface
Collection Method: Likely contains outputs from a fine-tuned Qwen3 4B model evaluated on the Proofbench framework.
Time Range: null
Freshness: Last updated 2026-04-01 02:29:50.
Geography: null

License is unknown; usage rights must be verified.

Text JSON Size Categories1 Kn10 K Librarypolars Ai Safety Modalitytext Librarymlcroissant Librarydatasets Proof Bench Librarypandas Llm Evaluation Graded Responses Regionus

Stage2 Qwen3 4B Sft Proofbench Summary Graded: AI Model Safety Evaluations

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info