New York Taxi Trip Data Enriched with Calendar, Weather, and Travel Features
arff
Available on 1 platform
Sign in to view source links and access this dataset
Description
New features for the Kaggle New York City Taxi Trip Duration competition were generated using the Wolfram Mathematica computational system. The dataset combines original trip records with calendar, weather, and estimated travel features. It includes columns for vendor, passenger count, time, season, temperature, precipitation, GPS coordinates, and estimated driving distance and time.
Use Cases
Predicting taxi trip duration based on time, weather, and estimated driving distance features.
Analyzing seasonal and daily period traffic patterns using calendar and day period columns.
Comparing geographic distance (geoDistance) with estimated road network distance (drivingDistance).
Modeling the impact of weather conditions like temperature, rain, and snow on taxi demand and travel times.
Strengths
Data is enriched with multiple feature groups: calendar, weather, and estimated travel metrics.
Includes a unique trip identifier (id) and a flag for store-and-forward trips.
License is CC0-1.0, permitting broad public use.
Limitations
Row count and file size are unknown, which may limit suitability assessment.
Last update date is unknown; freshness unverified.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
Derived from the Kaggle New York City Taxi Trip Duration competition data.
Collection Method
Original features were extracted and new features were generated using Wolfram Mathematica.
Geography
New York City, based on start and end latitude/longitude columns.
The tripDuration column contains a value of -1 to indicate test rows, which may require filtering.