Sign in to view source links and access this dataset
Description
IdioLink is a retrieval benchmark created by Intellexus and described in a 2025 arXiv paper. It evaluates whether embedding models can match queries to documents sharing the same conceptual meaning, regardless of figurative or literal usage. The benchmark contains 107 idioms, 10,700 documents, and 2,140 queries used to evaluate 24 models across 4 query configurations.
Use Cases
Benchmarking semantic retrieval models based on their ability to bridge idiomatic and literal expressions.
Evaluating embedding model performance based on the 4 distinct query configurations described.
Training or fine-tuning models for idiom understanding based on the 107 idioms and associated documents.
Strengths
Benchmark includes 107 idioms, providing a focused test set.
Contains 10,700 documents and 2,140 queries for evaluation.
Evaluates 24 models across 4 query configurations, allowing for comparative analysis.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Provenance
Source
Intellexus
Collection Method
Likely constructed as a research benchmark for evaluating embedding models.
Freshness
Last updated 2026-05-23 08:19:06; freshness should be verified.
License is unknown; terms of use must be verified.