Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
LLM Physics Law-Breaker Benchmark Results evaluate how 21 large language models perform against 34 adversarial physics-based reasoning traps. The dataset contains benchmark scores assessing model robustness to logical inconsistencies and physical fallacies. The original author and creation date are unknown.
The dataset likely contains only aggregated benchmark results, not the underlying model prompts, responses, or fine-tuning data.