TRL-DLTE: 47,772-Table Data Lake for Tabular Encoder Evaluation

Name: TRL-DLTE: 47,772-Table Data Lake for Tabular Encoder Evaluation
Creator: logo-lab
Published: 2026-05-01T18:00:04
Keywords: Data Lake, Machine Learning Benchmark, Benchmark, Tabular, Tabular Encoders, Table Enrichment

by logo-labUpdated 21d ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

47,772 tables derived from 1,379 parent tables from TabFact and WikiTableQuestions, fragmented at four cumulative noise tiers. The dataset is part of the TRL-Bench suite for evaluating tabular encoders, created by logo-lab and last updated on June 11, 2026.

Use Cases

Benchmarking table retrieval models based on the data lake's fragmented structure.
Evaluating tabular encoder robustness based on the four cumulative noise tiers (clean, schema, cell, hard).
Training models for table union operations based on the union target data.
Training models for table join operations based on the join target data.
Studying representation-level performance for cross-paradigm tasks as described in the associated paper.

Strengths

Large scale with 47,772 derived tables.
Structured noise injection across four defined tiers (clean, schema, cell, hard).
Derived from established source datasets (TabFact and WikiTableQuestions).

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count and file formats are unknown, which may limit suitability assessment.

Provenance

Source: Derived from TabFact and WikiTableQuestions parent tables.
Collection Method: Fragmented at four cumulative noise tiers to create a compositional data lake.
Time Range: null
Freshness: Last updated 2026-06-11 03:52:46; freshness should be verified.
Geography: null

License is unknown; restrictions should be verified before use.

Tabular Data Lake Machine Learning Benchmark Benchmark Tabular Encoders Table Enrichment

Related Datasets

Quality Score

D38

Description

42

Source

36

Reputation

42

Access

26

Community

91 downloads

1 likes

0 views

Dataset Info

Author: logo-lab
Created: May 1, 2026
Updated: Jun 11, 2026
Last synced: Jun 18, 2026

Access

26

Community

91 downloads

1 likes

0 views

Dataset Info

Author: logo-lab
Created: May 1, 2026
Updated: Jun 11, 2026
Last synced: Jun 18, 2026

TRL-DLTE: 47,772-Table Data Lake for Tabular Encoder Evaluation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info