68,000 sign-prefixed transaction descriptions across 17 spending categories model real US bank statement formats. This synthetic dataset was created by DoDataThings as a successor to version 1, with the latest update recorded in April 2026.
Use Cases
- Classify transaction descriptions into one of 17 spending categories using the sign-prefixed text feature.
- Train NLP models to recognize transaction patterns from formats used by Chase, Apple Card, PayPal, Capital One, and Mercury.
- Benchmark classifier robustness against noisy, real-world transaction descriptions instead of clean merchant names.
Strengths
- 68,000 transaction description records
- Covers 17 distinct spending categories
Limitations
- Data is synthetic, not real transaction records, which may limit model generalization
- Specific column names, data distributions, and potential class imbalances are unknown
Provenance
- Source
- Hugging Face dataset by DoDataThings.
- Collection Method
- Synthetically generated to model real US bank transaction description formats.
- Time Range
- null
- Freshness
- Last updated April 2026.
- Geography
- Modeled after US bank formats.