Sign in to view source links and access this dataset
Description
5,160 synthetic instances form a structured benchmark designed to stress-test whether models learn latent rules from context. It was created by author 'longarmd' and last updated on Hugging Face in April 2026. The benchmark employs a full factorial design over task, format, application, and difficulty, with 10 independent draws per cell.
Use Cases
Benchmarking model performance on latent rule learning based on the structured factorial design.
Analyzing the impact of task format on in-context learning based on the varied format dimension.
Studying the effect of task difficulty on model generalization based on the difficulty variable.
Evaluating model transfer learning across different applications based on the application dimension.
Strengths
5,160 total instances provide a substantial testbed for evaluation.
Full factorial design over four variables (task, format, application, difficulty) allows for controlled, systematic analysis.
10 i.i.d. draws per cell help ensure statistical reliability of results.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data is synthetic, which may limit direct applicability to real-world scenarios.
Provenance
Source
Hugging Face dataset uploaded by author 'longarmd'.
Collection Method
Synthetically generated benchmark instances.
Freshness
Last updated 2026-04-09 12:51:46; freshness should be verified.
License is unknown; users should verify terms before use.