Sign in to view source links and access this dataset
Description
Dataset contains 19,000 open-access research papers related to COVID-19 collected from various sources between 2020 and 2021. Includes metadata such as titles, authors, abstracts, publication dates, and source repositories.
Use Cases
Train a text classifier to categorize papers by research theme (e.g., epidemiology, virology, treatment)
Perform topic modeling to identify emerging trends in COVID-19 research
Extract key entities like drug names or study locations from the full-text papers
Strengths
Large scale with 19,000+ papers
Open access with full-text availability for many entries
Rich metadata including abstracts and publication dates
Limitations
Heterogeneous quality across sources; some papers may be preprints not peer-reviewed
Limited to no standardized annotations or labels
Temporal coverage ends 2021; may not include very recent research
Provenance
Source
Papers aggregated from platforms like PubMed Central, arXiv, bioRxiv.
Collection Method
Collected via automated harvesting from defined sources.
Time Range
2020 - 2021
Geography
Global
Some entries may be duplicates across sources; deduplication may be needed. License terms may vary by paper.