Legal Eval: A Unified Benchmark for Cost-Efficient LLM Testing

Name: Legal Eval: A Unified Benchmark for Cost-Efficient LLM Testing
Creator: nguha
Published: 2026-04-03T21:10:21
Keywords: Benchmark, Llm Evaluation, Tabular, Zero Shot Prompting, Legal Reasoning, Benchmark Aggregation

by nguhaUpdated 2mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Approximately 9,769 zero-shot prompt samples aggregated from 5 source benchmarks for evaluating large language models on legal reasoning tasks. The dataset was created by author nguha and last updated on 2026-04-17. It consolidates 202 distinct tasks from benchmarks including legalbench, barexam, lexam, housingqa, and legal_hallucinations.

Use Cases

Benchmarking model performance on legal reasoning based on the aggregated tasks from 5 source benchmarks.
Conducting cost-efficient LLM evaluation based on the pre-formatted zero-shot prompts.
Comparing model results across different legal task types based on the unified flat schema.
Analyzing model performance on specific legal domains based on the source benchmark and task_name fields.

Strengths

Aggregates approximately 9,769 samples across 202 tasks, providing a substantial testbed.
Consolidates data from 5 distinct source benchmarks into a single flat schema.
Samples are pre-formatted as zero-shot prompts, ready for direct model input.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Last updated 2026-04-17 05:24:43; freshness should be verified.

Provenance

Source: Aggregated from 5 benchmarks: legalbench, barexam, lexam, housingqa, and legal_hallucinations.
Collection Method: Unified aggregation into a single flat schema.
Freshness: 2026-04-17

License is unknown and should be verified before use.

Tabular Benchmark Llm Evaluation Zero Shot Prompting Legal Reasoning Benchmark Aggregation

Related Datasets

Quality Score

C41

Description

51

Source

36

Reputation

41

Access

26

Community

48 downloads

1 likes

0 views

Dataset Info

Author: nguha
Created: Apr 3, 2026
Updated: Apr 17, 2026
Last synced: Jun 19, 2026

Access

26

Community

48 downloads

1 likes

0 views

Dataset Info

Author: nguha
Created: Apr 3, 2026
Updated: Apr 17, 2026
Last synced: Jun 19, 2026

Legal Eval: A Unified Benchmark for Cost-Efficient LLM Testing

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info