CUAD: 509 Commercial Legal Contracts with PDFs and Extracted Text

Name: CUAD: 509 Commercial Legal Contracts with PDFs and Extracted Text
Creator: dvgodoy
Published: 2025-01-20T14:29:20
Keywords: Legal Documents, Contract Understanding, Pdf Text, Text, Natural Language Processing, Multimodal

by dvgodoyUpdated 1y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

509 commercial legal contracts from the CUAD dataset, with PDFs and cleaned extracted text. The dataset was created by dvgodoy and last updated on 2025-01-29. One original contract was removed due to being a scanned copy.

Use Cases

Train contract clause classification models based on the full text of legal documents.
Develop PDF-to-text extraction and cleaning pipelines using the provided base64-encoded PDFs and cleaned text.
Benchmark named entity recognition for legal entities and terms within commercial contracts.
Analyze the structure and language patterns of commercial legal agreements.

Strengths

Contains 509 commercial legal contracts, providing a substantial corpus for analysis.
Includes both the original PDFs (base64 encoded) and the cleaned extracted text for each contract.
Text was cleaned using the clean-text library, which likely improves consistency for NLP tasks.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: Original CUAD (Contract Understanding Atticus Dataset).
Collection Method: PDFs were encoded in base64 and text was extracted and cleaned.
Freshness: Last updated 2025-01-29 18:32:20; freshness should be verified.

Text Multimodal Legal Documents Contract Understanding Pdf Text Natural Language Processing

Related Datasets

Quality Score

D36

Description

42

Source

36

Reputation

30

Access

26

Community

146 downloads

1 likes

0 views

Dataset Info

Author: dvgodoy
Created: Jan 20, 2025
Updated: Jan 29, 2025
Last synced: Jul 3, 2026

Access

26

Community

146 downloads

1 likes

0 views

Dataset Info

Author: dvgodoy
Created: Jan 20, 2025
Updated: Jan 29, 2025
Last synced: Jul 3, 2026

CUAD: 509 Commercial Legal Contracts with PDFs and Extracted Text

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info