1.7 million arXiv research articles across multiple scientific disciplines. The dataset includes metadata for titles, abstracts, authors, and category classifications for papers published on the preprint server.
Use Cases
- Train a multi-label classifier to predict paper categories based on the 'title' and 'abstract' text
- Build a paper recommendation engine using the 'abstract' and 'categories' fields
- Construct a knowledge graph by extracting relationships from the 'authors' and co-citation data
Strengths
- 1.7 million research articles with associated metadata
- Includes specific fields for 'categories', 'abstract', and 'authors'
- Supports co-citation network analysis through paper metadata