Traffic Crash Types in Izmir Districts Analyzed with a Novel XGBoost Method
by Yağmur Özinal Avşar·Updated 3d ago
956.9 KB6files
Available on 1 platform
Sign in to view source links and access this dataset
Description
Yağmur Özinal Avşar proposes a novel machine learning method, Embedding-XGBoost (E-XGB), for classifying traffic crash types. The model was trained and tested on 10 years of crash data from central districts in İzmir, Türkiye, specifically Bornova, Karşıyaka, and Konak. The method achieved an accuracy of 85.42% using only 10 selected features and outperformed several benchmark algorithms.
Use Cases
Predicting crash types for specific road sections based on geometric road elements.
Analyzing the relationship between road features and crash severity outcomes.
Benchmarking novel dimensionality reduction and classification methods against standard ML algorithms.
Conducting feature importance analysis to identify key factors in traffic crash classification.
Strengths
The proposed E-XGB method demonstrated a classification accuracy of 85.42%.
Model performance was validated against four benchmark algorithms (XGBoost, SVM, KNN, MLP).
The dataset covers a 10-year period, providing a substantial temporal scope for analysis.
The model is described as robust against missing data and computationally efficient.
Limitations
The dataset's specific row count and column-level details are unknown, limiting suitability assessment.
Geographic coverage is limited to 10 central districts of İzmir, Türkiye.
The primary data file is a DOCX document, which may require extraction or conversion for direct analysis.
Provenance
Source
Yağmur Özinal Avşar via figshare.
Collection Method
Analysis of crash data using a novel two-stage dimensionality reduction approach integrating entity embeddings with XGBoost.
Time Range
10-year period (specific years not stated).
Freshness
Last updated 2026-06-02 20:15:27; freshness should be verified.
Geography
10 central districts of İzmir, Türkiye, with a focus on Bornova, Karşıyaka, and Konak.
Data is shared as a 956.9 KB DOCX file; the actual structured dataset may be embedded within the document.