Padiweb collection of automatically and manually verified articles on highly pathogenic av
by Valentin, Sarah / CIRAD Harvested Collection·Updated 1mo ago
Available on 1 platform
Sign in to view source links and access this dataset
Description
656 news articles on highly pathogenic avian influenza outbreaks, manually labeled for event status. The dataset was collected by the Padiweb multilingual text-mining software from Google News in English, Russian, Norwegian, Finnish, and Swedish during 2020 and 2021. It includes a separate dataset of 3,100 location names extracted from relevant articles.
Use Cases
Train models for classifying disease outbreak status based on manually labeled news articles.
Analyze geographic patterns of avian influenza using the extracted location dataset.
Develop multilingual text-mining pipelines for animal health event surveillance.
Study media reporting trends on avian influenza across different European regions.
Strengths
656 articles manually labeled for HPAI status, providing a verified training corpus.
Multilingual coverage across five languages (English, Russian, Norwegian, Finnish, Swedish).
Includes a separate dataset of 3,100 extracted location names for spatial analysis.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Data is limited to a specific two-year time range (2020-2021) and geographic focus.
Provenance
Source
CIRAD Harvested Collection, authored by Valentin, Sarah.
Collection Method
Articles collected and classified by Padiweb multilingual text-mining software from Google News, then normalized and manually labeled.
Time Range
2020-2021
Freshness
Last updated 2026-04-20 15:42:52; freshness should be verified.
Geography
Russia, Northern Europe, and Eastern Europe
License information is unknown; terms of use should be verified before download.