Sign in to view source links and access this dataset
Description
650 relational databases spanning academic, e-commerce, finance, sports, biomedical, and government domains, ported to the RelBench manifest format. The collection is built for pretraining relational and tabular foundation models, with each database being self-describing. It was created by stanford-rdl and last updated on June 12, 2026.
Use Cases
Pretrain relational foundation models based on the collection's 650 multi-domain databases.
Benchmark tabular model generalization based on the diverse domains like e-commerce and biomedical data.
Develop text-to-SQL models based on the included text2sql domain databases.
Study cross-domain relational patterns based on the broad collection spanning finance, sports, and government.
Strengths
Contains 650 distinct relational databases, providing substantial scale for pretraining.
Spans many domains including academic, e-commerce, finance, sports, biomedical, and government, offering diversity.
Databases are self-describing and formatted for the RelBench manifest, which may aid standardization.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment for specific tasks.
Last updated 2026-06-12 19:54:50; freshness should be verified.
Provenance
Source
huggingface, author stanford-rdl
Collection Method
Ported from original sources to the RelBench manifest format.
Freshness
Last updated June 12, 2026.
License is unknown; terms of use must be verified before application.