Data from the 2014-15 to 2019-20 seasons for six European football leagues, scraped from Understat.com. The dataset contains 22 columns of match-level statistics, including expected goals, expected points, and defensive pressure metrics. It is intended for analyzing team performance beyond traditional results.
Use Cases
- Evaluating team performance fairness based on expected goals (xG) and expected points (xP) metrics.
- Analyzing defensive pressure and build-up play using the deep passes and PPDA (passes allowed per defensive action) features.
- Predicting future match outcomes by modeling historical expected goals data.
- Comparing the attacking efficiency of teams across different leagues using the chance percentage feature.
Strengths
- Covers six major European football leagues over six consecutive seasons.
- Includes 22 columns of match statistics, with specific advanced metrics like expected goals and PPDA defined.
- Released under a permissive CC0 1.0 license for broad reuse.
Limitations
- Column-level documentation is absent; field semantics for many of the 22 columns must be inferred after download.
- Row count is unknown, which may limit suitability assessment for large-scale modeling.
- Last update date is unknown; freshness unverified.
Provenance
- Source
- Understat.com
- Collection Method
- Scraped using neural network approximations from the source website.
- Time Range
- 2014-15 to 2019-20 seasons
- Geography
- Europe (English Premier League, La Liga, Bundesliga, Serie A, Ligue 1, Russian Football Premier League)