100 Chinese natural language queries paired with semi-structured retrieval expressions for patent and academic paper search. This sample subset, created by KyrieSun, is intended for community validation and baseline testing of text-to-SQL models. The dataset also includes English translations of the queries.
Use Cases
- Training text-to-SQL models based on the Chinese query and nl2sql expression pairs
- Benchmarking semantic parsing systems for technical information retrieval based on the provided query-expression mappings
- Fine-tuning machine translation models for technical text based on the included Chinese-English query pairs
- Developing query understanding systems for patent and academic search engines based on the domain-specific natural language queries
Strengths
- Contains 100 precisely aligned query-expression pairs for a specific domain
- Includes both Chinese natural language queries and their English translations
- Explicitly designed for community validation and baseline testing
Limitations
- Row count is unknown, which may limit suitability assessment
- Column-level documentation is absent; field semantics must be inferred after download
- Description metadata is limited; actual data quality requires manual inspection after download
Provenance
- Source
- huggingface
- Collection Method
- Likely manually or semi-automatically curated for the patent and paper retrieval domain.
- Time Range
- null
- Freshness
- Last updated 2026-03 31 03:21:06; freshness should be verified
- Geography
- null