Turkish legal case law from the General Assembly of Civil Chambers of the Court of Cassation (Yargıtay) structured for retrieval and summarization. It provides query-corpus pairs filtered by token limits to support MTEB tokenizer benchmark testing and hierarchical keyword generation.
Use Cases
- Benchmark embedding models using the query-corpus pairs for legal information retrieval tasks.
- Train models to automate the generation of hierarchical Catchwords from raw legal text.
- Perform domain-specific pre-training for Turkish legal language models using the Yargıtay case law corpus.
Strengths
- Features hierarchical 'Case Summary Keywords' (Catchwords) for structured legal indexing.
- Includes query-corpus pairs pre-filtered by maximum token limits for embedding evaluation.
- Contains specialized legal text from the General Assembly of Civil Chambers of the Turkish Court of Cassation.