DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Mechanistic Interpretability Skills for Transformer Model Analysis | DataSalon

Home Reinforcement LearningMechanistic Interpretability Skills for Transformer Model Analysis

Reinforcement Learning

Mechanistic Interpretability Skills for Transformer Model Analysis

Name: Mechanistic Interpretability Skills for Transformer Model Analysis
Creator: bedderautomation
Published: 2026-03-11T02:12:37
Keywords: Regionus

by bedderautomation·Updated 3mo ago

Available on 1 platform

Description

A collection of skills for mechanistic interpretability analysis of large language models, including refusal geometry extraction and boundary surface mapping. The dataset is authored by bedderautomation and was last updated on March 11, 2026. It is designed for use with Claude Code, OpenAI Codex, and Gemini CLI, supporting the agentskills.io standard.

Use Cases

Apply the refusal-geometry skill to extract and analyze refusal cone geometry from open-weight transformer models.
Use boundary surface mapping skills to investigate model decision boundaries for mechanistic analysis.
Leverage self-referential mechanistic analysis skills to probe internal representations of transformer models.

Strengths

Last updated on March 11, 2026, indicating recent maintenance.
Includes a 6-stage extraction pipeline for refusal geometry analysis.
Compatible with multiple major language model tools (Claude Code, OpenAI Codex, Gemini CLI).

Limitations

No sample data, column definitions, or size information is provided, limiting initial assessment.
The dataset's structure, row count, and specific data format are unknown.

Provenance

Source: huggingface
Freshness: Last updated on March 11, 2026.

The full description is truncated; users must visit the dataset page on Hugging Face for complete details. License information is unknown.

Regionus

Related Datasets

Quality Score

D36

Description

Source

Reputation

Quality Score

D36

Description

Source

Reputation

Access

Community

29 downloads

1 likes

0 views

Dataset Info

Author: bedderautomation
Created: Mar 11, 2026
Updated: Mar 11, 2026

Access

Community

29 downloads

1 likes

0 views

Dataset Info

Author: bedderautomation
Created: Mar 11, 2026
Updated: Mar 11, 2026

Mechanistic Interpretability Skills for Transformer Model Analysis

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info