Name: Traffic Crash Types in Izmir Districts Analyzed with a Novel XGBoost Method
Creator: Yağmur Özinal Avşar
Published: 2026-06-02T20:15:27
License: CC-BY-4.0
Keywords: Machine Learning, Tabular, Road Safety, Time Series, Feature Analysis, Geospatial, Traffic Crashes

Description

Yağmur Özinal Avşar proposes a novel machine learning method, Embedding-XGBoost (E-XGB), for classifying traffic crash types. The model was trained and tested on 10 years of crash data from central districts in İzmir, Türkiye, specifically Bornova, Karşıyaka, and Konak. The method achieved an accuracy of 85.42% using only 10 selected features and outperformed several benchmark algorithms.

Use Cases

Predicting crash types for specific road sections based on geometric road elements.
Analyzing the relationship between road features and crash severity outcomes.
Benchmarking novel dimensionality reduction and classification methods against standard ML algorithms.
Conducting feature importance analysis to identify key factors in traffic crash classification.

Strengths

The proposed E-XGB method demonstrated a classification accuracy of 85.42%.
Model performance was validated against four benchmark algorithms (XGBoost, SVM, KNN, MLP).
The dataset covers a 10-year period, providing a substantial temporal scope for analysis.
The model is described as robust against missing data and computationally efficient.

Limitations

The dataset's specific row count and column-level details are unknown, limiting suitability assessment.
Geographic coverage is limited to 10 central districts of İzmir, Türkiye.
The primary data file is a DOCX document, which may require extraction or conversion for direct analysis.

Provenance

Source: Yağmur Özinal Avşar via figshare.
Collection Method: Analysis of crash data using a novel two-stage dimensionality reduction approach integrating entity embeddings with XGBoost.
Time Range: 10-year period (specific years not stated).
Freshness: Last updated 2026-06-02 20:15:27; freshness should be verified.
Geography: 10 central districts of İzmir, Türkiye, with a focus on Bornova, Karşıyaka, and Konak.

Data is shared as a 956.9 KB DOCX file; the actual structured dataset may be embedded within the document.

Tabular Time Series Geospatial Machine Learning Road Safety Feature Analysis Traffic Crashes

Traffic Crash Types in Izmir Districts Analyzed with a Novel XGBoost Method

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info