Chinese High-Tech Innovation Data for Four Cities, 2016-2025
by Hua Song·Updated 23d ago
5.5 KB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
Hua Song compiled over 39,000 publications from Web of Science and nearly 10,000 patent records from China's national patent database for the period 2016–2025. The data covers four Chinese cities—Wuhan, Chengdu, Hangzhou, and Tianjin—and four technological domains: artificial intelligence, fiber-optic communication, intelligent connected vehicles, and storage chips. The study uses bibliometric analysis, co-word network modeling, collaboration network mapping, and LLM-assisted semantic interpretation.
Use Cases
Compare urban innovation trajectories based on publication and patent data.
Analyze inter-city heterogeneity in technological portfolios and research priorities.
Map collaboration structures among actors in China's high-tech industries.
Examine the role of LLMs in cleaning and standardizing textual metadata for large-scale analysis.
Strengths
Contains over 39,000 publications and nearly 10,000 patent records.
Covers a 10-year time range from 2016 to 2025.
Focuses on four distinct technological domains and four representative Chinese cities.
Employs an integrated analytical framework combining multiple quantitative and semantic methods.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The dataset is very small at 5.5 KB, suggesting it contains summary or processed data rather than raw records.
Provenance
Source
figshare, author Hua Song.
Collection Method
Compiled from Web of Science and China's national patent database, analyzed with bibliometrics, network modeling, and LLM-assisted interpretation.
Time Range
2016–2025
Freshness
Last updated 2026-05-14 17:26:43; freshness should be verified.
Geography
Four Chinese cities: Wuhan, Chengdu, Hangzhou, Tianjin.
License is CC-BY-4.0. The 5.5 KB size indicates a highly processed or summary dataset, not raw bulk data.