Sign in to view source links and access this dataset
Description
PerCoR is a large-scale Persian benchmark for commonsense reasoning in a 4-choice sentence-completion format. It contains approximately 106,000 examples sourced from over 40 Persian websites across domains like news, culture, lifestyle, tech, religion, and travel. The dataset was created by author mina8113 and last updated on Hugging Face in May 2026.
Use Cases
Training language models for Persian commonsense reasoning based on the multiple-choice sentence completion format.
Benchmarking model performance on Persian natural language understanding tasks based on the described prefix-and-completion structure.
Analyzing linguistic and cultural commonsense patterns in Persian text based on data sourced from diverse websites.
Strengths
Contains approximately 106,000 examples, providing a substantial corpus for model training and evaluation.
Sourced from over 40 Persian websites, indicating diversity across domains like news, culture, lifestyle, tech, religion, and travel.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Last updated 2026-05-23 07:33:44; freshness should be verified.
Provenance
Source
Over 40 Persian websites across news, culture, lifestyle, tech, religion, travel, and more.
Freshness
Last updated 2026-05-23 07:33:44.
Geography
Persian-language content.
License is unknown; terms of use must be verified.