Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Analysis data from the paper 'Hidden Dynamics of Massive Activations in Transformer Training' characterizes the emergence patterns of large scalar values in transformer hidden states. The dataset provides detailed measurements and mathematical characterizations across the Pythia model family during training. It was created by Aimpoint-Digital and last updated on August 14, 2025.
The full description is hosted externally; users must visit the provided Hugging Face dataset page for complete details. The specific license is listed as 'mit' in the tags but is not explicitly confirmed in the provided input.