ProcessFlow: A Multi-Format Code Dataset for LLM Agent Training

Name: ProcessFlow: A Multi-Format Code Dataset for LLM Agent Training
Creator: caiovicentino1
Published: 2026-04-09T22:08:40
Keywords: Agent Training, Text, Llm Training, Process Centric

by caiovicentino1Updated 1mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

A multi-format, process-centric code dataset for training LLM agents. The dataset was empirically validated on 2026-04-10, where fine-tuning a model on version 1.7 with 108,000 training samples for 3 epochs produced a significant performance improvement on the ProcessFlow-Eval benchmark. It was authored by caiovicentino1 and last updated on 2026-04-11.

Use Cases

Fine-tuning LLMs for process-centric code generation based on the dataset's multi-format structure.
Benchmarking LLM agent performance on code-related tasks using the ProcessFlow-Eval metric mentioned.
Training models to understand and generate code sequences for workflow automation, as suggested by the 'process-centric' description.

Strengths

Empirically validated on 2026-04-10, showing a +0.681 ProcessFlow-Eval delta.
Training involved 108,000 samples over 3 epochs, indicating a substantial training corpus.
Validation passed three gates with no HumanEval regression and a perplexity improvement of -4.62 nats.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: huggingface
Freshness: Last updated 2026-04-11 00:45:04.

License is unknown; terms of use must be verified on the dataset page.

Text Agent Training Llm Training Process Centric

Related Datasets

Quality Score

D37

Description

39

Source

39

Reputation

38

Access

26

Community

7 downloads

1 likes

0 views

Dataset Info

Author: caiovicentino1
Created: Apr 9, 2026
Updated: Apr 11, 2026
Last synced: Apr 17, 2026

Access

26

Community

7 downloads

1 likes

0 views

Dataset Info

Author: caiovicentino1
Created: Apr 9, 2026
Updated: Apr 11, 2026
Last synced: Apr 17, 2026

ProcessFlow: A Multi-Format Code Dataset for LLM Agent Training

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info