Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
1,000 to 10,000 records benchmark safety-utility trade-offs across 12 Large Language Models in the legal domain, published by marvintong in 2025. The data includes legal questions, multi-phase evaluations, and contract text to measure model performance and over-refusal tendencies. It is structured into distinct subsets for questions, evaluations, and legal documents.
Requires the Hugging Face datasets library; users can load specific subsets like 'questions' or 'phase1_evaluations' independently using the load_dataset function. The dataset is released under the MIT license.