A benchmark for evaluating patent novelty search systems, created by PatSnap and last updated in June 2026. Each sample contains a query patent publication number and ground truth novelty-destroying prior art references identified by examiners. The dataset is a 50% public release of an internal full evaluation set combining cross-jurisdiction and single-jurisdiction sample types.
Use Cases
- Benchmarking patent novelty search algorithms based on the provided query patents and ground truth references.
- Evaluating prior art retrieval systems based on the X-type references identified by examiners.
- Training or fine-tuning information retrieval models for the patent domain based on the described query-reference pairs.
- Studying the characteristics of novelty-destroying references across different jurisdictions based on the combined sample types.
Strengths
- Contains ground truth prior art references identified by patent examiners.
- Designed as a 50% public release of a larger internal evaluation set.
- Combines two complementary sample types: cross-jurisdiction family-expanded and single-jurisdiction samples.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- PatSnap
- Collection Method
- Likely compiled from patent office data and examiner annotations.
- Time Range
- null
- Freshness
- Last updated 2026-06 05:36:40; freshness should be verified.
- Geography
- null