Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A collection of 550 test cases for evaluating large language models in financial contexts. It was produced by BC Card and Yonsei University DSL as part of the S2026 LLMOps project. The dataset includes 300 regression test cases, 200 financial edge cases for hallucination detection, and 50 hard negative QA samples.
Licensed under Apache 2.0.