Turkish Law Documents 700K Clustered

Name: Turkish Law Documents 700K Clustered
Creator: erdem-erdem
Published: 2025-11-10T18:09:27
Keywords: Librarypolars, Librarydask, Modalitytext, Size Categories100 Kn1 M, Modalitytabular, Librarymlcroissant, Librarydatasets, Parquet, Regionus

by erdem-erdemUpdated 7mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

700,000 Turkish legal documents from the Yargıtay and Danıştay courts are organized via multiple embedding models and clustering algorithms. These records represent the primary sources of legal precedent in Turkey for civil and criminal cases.

Use Cases

Train Turkish legal NLP models using the 700,000 document texts
Benchmark clustering algorithms using the pre-computed cluster assignments
Perform semantic search on Turkish legal precedents using the embedding-based clusters

Strengths

700,000 individual legal documents from Turkey's highest courts
Sourced from Yargıtay, the supreme court of appeal for civil and criminal matters
Clustered using multiple embedding models and algorithms for comparative research

Parquet Librarypolars Librarydask Modalitytext Size Categories100 Kn1 M Modalitytabular Librarymlcroissant Librarydatasets Regionus

Related Datasets

Quality Score

D37

Description

39

Source

36

Reputation

45

Access

22

Community

165 downloads

3 likes

0 views

Dataset Info

Author: erdem-erdem
Created: Nov 10, 2025
Updated: Nov 11, 2025
Last synced: Apr 11, 2026

Access

22

Community

165 downloads

3 likes

0 views

Dataset Info

Author: erdem-erdem
Created: Nov 10, 2025
Updated: Nov 11, 2025
Last synced: Apr 11, 2026

Turkish Law Documents 700K Clustered

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info