Sign in to view source links and access this dataset
Description
Facebook introduces AdvancedIF, a benchmark featuring over 1,600 prompts designed to assess large language models. The dataset includes expert-curated rubrics to evaluate proficiency in complex instruction following, multi-turn interactions, and system prompt steerability. It was last updated on November 26, 2025.
Use Cases
Benchmarking LLM performance on complex instructions based on prompts with 6+ combined constraints
Evaluating multi-turn conversational consistency based on instruction-carrying tasks
Testing model steerability based on adherence to system prompts
Analyzing failure modes in instruction following based on the expert-curated rubric
Strengths
Over 1,600 prompts provide a substantial test set
Each prompt contains 6+ instructions combining multiple constraint types
Includes expert-curated evaluation rubrics
Limitations
Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Last updated 2025-11-26 04:12:14; freshness should be verified
Provenance
Source
facebook
Collection Method
Expert-curated benchmark creation
Freshness
Last updated 2025-11-26 04:12:14
License is unknown and should be verified before use.