A Persian (Farsi) question–answer corpus for social engineering and cybersecurity derived from curated knowledge articles extracted from authoritative reference books. The dataset was created by author smd20 and last updated on June 13, 2026. It is designed for supervised fine-tuning, retrieval-augmented generation evaluation, and domain-specific language model benchmarking.
Use Cases
- Supervised fine-tuning of language models based on the described question-answer pairs.
- Evaluating retrieval-augmented generation systems for cybersecurity knowledge retrieval.
- Benchmarking domain-specific language models on Persian social engineering content.
Strengths
- Derived from curated knowledge articles extracted from authoritative reference books.
- Part of a bilingual research dataset engineering pipeline designed for specific NLP tasks.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- Authoritative reference books.
- Collection Method
- Curated knowledge articles extracted from books.
- Freshness
- Last updated 2026-06-13 07:59:01.