Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Open arXiv contains metadata for 2.99 million scientific papers from the arXiv preprint repository. The collection includes titles, abstracts, authors, categories, DOIs, and version history, covering publications from 1991 to 2026.
The dataset is packaged into 417 Parquet shards, which may require specific tools for efficient querying and streaming.