Name: Scholar-kg: 1 Million Papers as a Queryable Knowledge Graph
Creator: InternScience
Published: 2026-05-27T07:53:35
Keywords: Knowledge Extraction, Graph, Academic Papers, Arxiv, Scientific Knowledge Graph, Biorxiv

Description

Approximately 1 million academic papers from sources like arXiv and bioRxiv have been processed into a unified, multi-layered knowledge graph. The dataset, created by InternScience, decomposes each paper into five modules covering metadata, entities, abstracted knowledge, citation context, and fine-grained relations. It was last updated on June 12, 2026.

Use Cases

Building a semantic search engine for academic papers based on the decomposed knowledge modules.
Training or evaluating information extraction models on the structured entities and relations.
Analyzing citation networks and research trends using the provided citation context.
Developing question-answering agents for scientific domains based on the abstracted knowledge representations.

Strengths

Contains about 1 million papers, providing a substantial scale for analysis.
Papers are decomposed into a structured, five-module representation (A–E) designed for querying.
Sources include established preprint repositories like arXiv and bioRxiv.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The description references a full page for details, indicating core metadata may be incomplete here.

Provenance

Source: Processed from arXiv, bioRxiv, and other unspecified sources by Agents-K1.
Collection Method: Papers were decomposed into a unified, queryable representation organized into five modules.
Freshness: Last updated 2026-06-12 03:03:40; freshness should be verified.

License is unknown and must be verified before use.

Graph Knowledge Extraction Academic Papers Arxiv Scientific Knowledge Graph Biorxiv

Scholar-kg: 1 Million Papers as a Queryable Knowledge Graph

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info