NICE is a theory-grounded diagnostic benchmark for evaluating the social intelligence of large language models. The framework organizes social intelligence into 4 categories, 11 dimensions, and 34 capability facets, enabling fine-grained diagnosis beyond aggregate scores. It was created by author Zoe0104 and last updated on Hugging Face in May 2026.
Use Cases
- Benchmarking LLM performance on social intelligence based on the 34 capability facets
- Diagnosing specific weaknesses in LLM social cognition based on the 11-dimensional framework
- Comparing model architectures on theory-grounded psychometric principles
- Validating LLM outputs against expert-validated social scenarios
Strengths
- Built on a theory-grounded framework derived from psychometric principles
- Organizes evaluation into 4 categories, 11 dimensions, and 34 capability facets for fine-grained diagnosis
- Includes expert validation as part of its construction methodology
Limitations
- Column-level documentation is absent; field semantics must be inferred after download
- Row count is unknown, which may limit suitability assessment
- Description metadata is limited; actual data quality requires manual inspection after download
Provenance
- Source
- Zoe0104 on Hugging Face
- Collection Method
- Built on a theory-grounded framework derived from psychometric principles and expert validation
- Freshness
- Last updated 2026-05-27 13:21:16; freshness should be verified