DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

SODA: Safety Over Depth for Agents Benchmark | DataSalon

Home Machine LearningSODA: Safety Over Depth for Agents Benchmark

Machine Learning

SODA: Safety Over Depth for Agents Benchmark

Name: SODA: Safety Over Depth for Agents Benchmark
Creator: cesun
Published: 2026-06-06T00:22:49
Keywords: Harmful Requests, Tool Use, Agent Benchmark, Benchmark, Text, Llm Safety, Conversation Depth

by cesun·Updated 23d ago

Available on 1 platform

Description

SODA evaluates how conversation depth affects agent safety across 16 tool-use environments with 80 scenarios. Each task places a harmful request at a controlled depth from D=0 to D=20, preceded by regular agentic tasks. The dataset was created by author 'cesun' and was last updated on 2026-06-12.

Use Cases

Benchmarking agent safety degradation based on conversation depth mentioned in the description
Evaluating the effectiveness of safety guardrails in multi-turn tool-use environments described in the overview
Studying the 'cold-start safety gap' for LLM agents as referenced in the dataset's purpose

Strengths

Benchmark spans 16 distinct tool-use environments
Contains 80 different scenarios for evaluation
Systematically controls conversation depth from 0 to 20 steps

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Description metadata is limited; actual data quality requires manual inspection after download

Provenance

Source: huggingface
Collection Method: Likely constructed for benchmarking purposes, as described.
Freshness: Last updated 2026-06-12 02:08:01; freshness should be verified

License is unknown; terms of use must be verified before application.

Text Harmful Requests Tool Use Agent Benchmark Benchmark Llm Safety Conversation Depth

Related Datasets

Quality Score

D37

Description

Source

Reputation

Quality Score

D37

Description

Source

Reputation

Access

Community

55 downloads

1 likes

0 views

Dataset Info

Author: cesun
Created: Jun 6, 2026
Updated: Jun 12, 2026
Last synced: Jul 1, 2026

Access

Community

55 downloads

1 likes

0 views

Dataset Info

Author: cesun
Created: Jun 6, 2026
Updated: Jun 12, 2026
Last synced: Jul 1, 2026

SODA: Safety Over Depth for Agents Benchmark

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info