Legal-LLM-Cleaned-v2: Text Corpus for Legal Language Models

Available on 1 platform

Sign in to view source links and access this dataset

Description

Legal-LLM-Cleaned-v2 is a dataset hosted on Kaggle, likely containing processed legal text intended for training or evaluating large language models. The title suggests the data has undergone a cleaning process, potentially to remove personally identifiable information or standardize formatting. Specific details on size, source, and creation date are not provided in the available metadata.

Use Cases

Fine-tune a language model for legal document summarization (inferred from domain, verify after download)
Benchmark model performance on legal question-answering tasks (inferred from domain, verify after download)
Train a classifier to identify legal concepts or document types (inferred from domain, verify after download)

Strengths

Published on Kaggle, a major platform for data science resources.
The title indicates a 'cleaned' version, which may imply efforts to improve data quality for machine learning.

Limitations

Metadata is minimal; actual content requires verification after download.
Row count, file format, and column structure are unknown, which limits suitability assessment.
License and authorship details are absent, complicating usage rights verification.

Text Cleaned Data Legal Text Large Language Models

Related Datasets

Quality Score

D16

Description

8

Source

17

Reputation

18

Access

31

Community

0 views

Dataset Info

Last synced: May 1, 2026

Access

31

Community

0 views

Dataset Info

Last synced: May 1, 2026

Legal-LLM-Cleaned-v2: Text Corpus for Legal Language Models

Description

Use Cases

Strengths

Limitations

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info