A collection of text corpora for agricultural risk analysis, including over 527,000 original sentences and 703,000 causal triplets. The dataset was created by WenJun Cui and last updated in May 2026. It focuses on 7 crop categories and 10 major agrometeorological disasters.
Use Cases
- Train event extraction models based on the annotated corpus of 4,743 entries.
- Build causal knowledge graphs for agricultural disasters using the library of over 703,000 causal triplets.
- Analyze text mentions of specific crop risks based on keyword sets covering wheat, corn, rice, and other crops.
- Study the co-occurrence of disaster types like drought, high temperature, and typhoons within agricultural texts.
Strengths
- Includes a large original sentence corpus of 527,474 entries.
- Contains a manually annotated subset of 4,743 entries for validation.
- Offers a substantial library of 703,508 causal triplets for relationship modeling.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count for individual components is unknown, which may limit suitability assessment.
Provenance
- Source
- figshare
- Freshness
- Last updated 2026-05-30 03:21:44