1319 GSM8K test examples each have 40 sampled Neural Language Arithmetic (NLA) descriptions. The descriptions were split into sentences and scored using a specific model. The dataset was created by Realmbird and last updated on 2026-05-31.
Use Cases
- Analyzing sentence-level contributions to model reasoning based on NLA description scores.
- Studying the relationship between residual stream activations and generated text descriptions.
- Evaluating the consistency of thought anchor descriptions across different problem instances.
- Training or benchmarking models for neural network activation analysis.
Strengths
- Contains 1319 distinct test examples from the GSM8K benchmark.
- Each example includes 40 sampled NLA descriptions, providing a substantial sample size for analysis.
- Activation extraction is focused on a specific token (the first digit after the answer prompt), offering a precise target.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- Realmbird on Hugging Face.
- Collection Method
- Activations extracted from kitft/nla-qwen2.5-7b-L20-av model and scored with kitft/nla-qwen2.5-7b-L20-ar model.
- Freshness
- Last updated 2026-05-31 15:53:58; freshness should be verified.