Uber & Rapido Driver Dataset — India (Dirty) is a 50,000-row synthetic dataset representing Indian ride-hailing drivers. The description notes it has real-world data qualities, but the author, organization, and specific creation details are unknown. The dataset is hosted on Kaggle.
Use Cases
- Analyze driver demographics and behavior patterns based on the synthetic driver profiles.
- Test data cleaning and preprocessing pipelines on a 'dirty' dataset with real-world qualities.
- Benchmark anomaly detection algorithms on synthetic ride-hailing driver data.
- Study simulated market dynamics within the Indian ride-hailing sector.
Strengths
- Contains 50,000 rows, providing a substantial volume for analysis.
- Described as having real-world data qualities, suggesting it may reflect practical complexities.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Column-level documentation is absent; field semantics must be inferred after download.
- Data may reflect geographic or source bias inherent to its synthetic generation method.
Provenance
- Source
- Kaggle
- Collection Method
- Synthetic generation, described as having real-world data qualities.
- Time Range
- null
- Freshness
- Last update date is unknown; freshness unverified.
- Geography
- India