Name: Pakistani Legal Judgments Corpus for Multi-Task NLP
Creator: amjadali070
Published: 2026-01-05T09:02:21
Keywords: Size Categories10 Kn100 K, Task Categoriestext Generation, Librarypolars, Librarydask, OPTIMIZED-PARQUET, Languageen, Task Categoriessummarization, Modalitytext, Librarymlcroissant, Languageur, Librarydatasets, Legal Text, Text, Parquet, Languagesd, Multilingual, Regionus, Natural Language Processing, Legal, Pakistan, Multi Task, Instruction Tuning, Task Categoriestranslation, Licensemit

Description

A multi-task instruction-tuning dataset for Pakistani legal documents, designed for OCR correction, legal translation, and summarization tasks. The corpus is divided into six configurations, each targeting a specific language task and pair, such as repairing broken English or translating between English, Urdu, and Sindhi. It was created by author amjadali070 and last updated on Hugging Face in January 2026.

Use Cases

Train OCR correction models using the 'repair' configuration to map broken English text to clean English.
Develop legal translation models using configurations like 'en_ur' to translate between English and Urdu.
Fine-tune text summarization models on legal judgments using the relevant summarization task subsets.
Build multi-task models that handle several configurations, such as translation between Sindhi and English ('sd_en'), simultaneously.
Evaluate model performance on specialized legal language across the defined source and target language pairs.

Strengths

Six distinct configurations allow focused training on specific NLP tasks.
Supports multiple languages relevant to Pakistan: English, Urdu, and Sindhi.
Dataset was updated on the platform in January 2026.

Limitations

Exact row counts, file sizes, and sample data are unavailable for assessing scale.
Specific license details and data collection methodology are not provided.
The dataset's geographic scope is limited to Pakistan, which may limit generalizability.

Provenance

Source: amjadali070 on Hugging Face.
Collection Method: Method of gathering Pakistani legal documents is unknown.
Time Range: null
Freshness: Last updated on 2026-01-05.
Geography: Pakistan

The dataset is split into six separate configurations (subsets) that must be loaded individually; the full structure and join keys are not specified.

Pakistani Legal Judgments Corpus for Multi-Task NLP

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info