Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Test set for clustering documents from the Big Patent dataset, containing documents belonging to nine distinct categories. It is part of the Massive Text Embedding Benchmark (MTEB) for evaluating embedding models on legal and written domain text.
This is a benchmark test set only, intended for evaluation, not for training models. The specific column structure and data format are not detailed in the provided input.