1,100 expert-authored conversations across Finance and Legal domains form the core of PRBench. The dataset includes 19,356 expert-curated rubric criteria and covers 114 countries, 47 U.S. jurisdictions, and 25 professional topics. ScaleAI released this dataset, which was last updated on January 15, 2026.
Use Cases
- Benchmarking AI reasoning models based on expert-authored conversations and rubric criteria.
- Training domain-specific language models for finance and legal tasks based on the described professional topics.
- Evaluating model performance on challenging subsets like Finance-300 and Legal-250.
- Analyzing geographic and jurisdictional coverage in professional AI applications based on the 114 countries and 47 U.S. jurisdictions.
Strengths
- 1,100 expert-authored conversations provide a substantial foundation for analysis.
- 19,356 expert-curated rubric criteria offer detailed evaluation targets.
- Coverage spans 114 countries, 47 U.S. jurisdictions, and 25 professional topics, suggesting broad applicability.
- Includes hard subsets (Finance-300, Legal-250) representing the most challenging tasks.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
Provenance
- Source
- ScaleAI
- Collection Method
- Expert-authored conversations and expert-curated rubric criteria.
- Time Range
- null
- Freshness
- Last updated 2026-01-15 18:38:00; freshness should be verified.
- Geography
- Covers 114 countries and 47 U.S. jurisdictions.