DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

MachineLearningLM: Synthetic ML Tasks from Structural Causal Models | DataSalon

Home NLP & TextMachineLearningLM: Synthetic ML Tasks from Structural Causal Models

NLP & Text

MachineLearningLM: Synthetic ML Tasks from Structural Causal Models

Name: MachineLearningLM: Synthetic ML Tasks from Structural Causal Models
Creator: eshmoideas
Published: 2026-06-15T11:36:38
Keywords: Machine Learning, In Context Learning, Text, Natural Language Processing, Synthetic Data, Causal Models

by eshmoideas·Updated 16d ago

Available on 1 platform

Description

A pretraining corpus for MachineLearningLM, a framework to equip large language models with in-context machine learning capabilities. The dataset consists of ML tasks synthesized from millions of structural causal models, spanning various shot counts up to 1,024. It was created by author 'eshmoideas' and last updated on June 15, 2026.

Use Cases

Pretrain LLMs for in-context learning based on the described corpus of synthesized ML tasks.
Benchmark LLM performance on standard ML tasks using the described examples spanning up to 1,024 shots.
Study the effect of example count on in-context learning based on the dataset's span of various shot counts.

Strengths

Synthesized from millions of structural causal models, suggesting a large generation base.
Designed to span various shot counts up to 1,024, enabling study of in-context learning scaling.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: huggingface user 'eshmoideas'.
Collection Method: Synthesized from structural causal models.
Freshness: Last updated 2026-06-15 11:36:39; freshness should be verified.

License is unknown; terms of use must be verified before application.

Text Machine Learning In Context Learning Natural Language Processing Synthetic Data Causal Models

Related Datasets

Quality Score

D37

Description

Source

Reputation

Quality Score

D37

Description

Source

Reputation

Access

Community

1 likes

0 views

Dataset Info

Author: eshmoideas
Created: Jun 15, 2026
Updated: Jun 15, 2026
Last synced: Jun 22, 2026

Access

Community

1 likes

0 views

Dataset Info

Author: eshmoideas
Created: Jun 15, 2026
Updated: Jun 15, 2026
Last synced: Jun 22, 2026

MachineLearningLM: Synthetic ML Tasks from Structural Causal Models

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info