Experiment logs from the MLEvolve framework across five Kaggle competitions, comparing three agent tiers: baseline, scientist, and socrates. The dataset was created by Pran-Ker and last updated on HuggingFace in May 2026. Competitions include NFL Player Contact Detection, Smartphone Decimeter Challenge, Stanford COVID Vaccine, and Statoil.
Use Cases
- Benchmarking agentic AI performance based on logs from different agent tiers
- Analyzing automated machine learning workflows based on multi-run experiment logs
- Comparing competition outcomes across different Kaggle challenges based on the described framework
Strengths
- Logs from five distinct Kaggle competitions provide comparative context.
- Explicitly compares three distinct agent tiers: baseline, scientist, and socrates.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
Provenance
- Source
- HuggingFace dataset uploaded by Pran-Ker.
- Collection Method
- Experiment logs generated by the MLEvolve framework.
- Freshness
- Last updated 2026-05-25 01:33:55; freshness should be verified.