DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

SkillTrustBench Results: AI Model and Tool Safety Evaluation Leaderboard | DataSalon

Home Machine LearningSkillTrustBench Results: AI Model and Tool Safety Evaluation Leaderboard

Machine Learning

SkillTrustBench Results: AI Model and Tool Safety Evaluation Leaderboard

Name: SkillTrustBench Results: AI Model and Tool Safety Evaluation Leaderboard
Creator: cuhk-zhuque
Published: 2026-06-08T09:49:10
Keywords: Leaderboard, Model Evaluation, Safety Classification, Ai Benchmark, Benchmark, Tabular

by cuhk-zhuque·Updated 4d ago

Available on 1 platform

Description

SkillTrustBench Results stores public leaderboard records for an AI safety benchmark. The dataset tracks two comparison groups: one fixing a model and comparing tools, and another fixing an analysis tool and comparing models. Raw system outputs are normalized into safety categories of normal (safe), suspicious, or malicious.

Use Cases

Compare the safety performance of different AI models based on normalized output classifications.
Evaluate the effectiveness of various analysis tools for detecting suspicious or malicious AI outputs.
Track leaderboard standings for AI agents and models on a specific safety benchmark track.

Strengths

Public leaderboard records provide transparent benchmarking data.
Outputs are normalized into three distinct safety categories for consistent evaluation.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: huggingface
Collection Method: Likely contains aggregated results from benchmark evaluations.
Freshness: Last updated 2026-06-15 06:20:11; freshness should be verified.

License is unknown; terms of use must be verified before application.

Tabular Leaderboard Model Evaluation Safety Classification Ai Benchmark Benchmark

Related Datasets

Quality Score

D38

Description

Source

Reputation

Quality Score

D38

Description

Source

Reputation

Access

Community

115 downloads

1 likes

0 views

Dataset Info

Author: cuhk-zhuque
Created: Jun 8, 2026
Updated: Jun 15, 2026
Last synced: Jun 19, 2026

Access

Community

115 downloads

1 likes

0 views

Dataset Info

Author: cuhk-zhuque
Created: Jun 8, 2026
Updated: Jun 15, 2026
Last synced: Jun 19, 2026

SkillTrustBench Results: AI Model and Tool Safety Evaluation Leaderboard

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info