
The Geometry of Concepts: Sparse Autoencoder Feature Structure
Oct 10, 2024 · Abstract page for arXiv paper 2410.19750: The Geometry of Concepts: Sparse Autoencoder Feature Structure Sparse autoencoders have recently produced dictionaries of …
The Geometry of Concepts: Sparse Autoencoder Feature Structure …
Mar 27, 2025 · SAE feature structure: Sparse autoencoders (SAEs) are a recent approach for discovering interpretable language model features without supervision, although relatively few …
Sparse Autoencoder Features for Classifications and Transferability
Feb 17, 2025 · Abstract: Sparse Autoencoders (SAEs) provide potentials for uncovering structured, human-interpretable representations in Large Language Models (LLMs), making …
Oct 29, 2024 · Thus, the present paper examines sparse autoencoder feature structure at three separate spatial scales, which we refer to informally as the “atom”-scale, “brain”-scale, and …
The Geometry of Sparse Autoencoder Concept Structure
Nov 2, 2024 · Feature Orthogonalit: Sparse features tend to become approximately orthogonal to each other, forming a basis-like structure in the representation space. Manifold Structure: The …
SAE feature structure: Sparse autoencoders have relatively recently garned attention as an ap-proach for discovering interpretable language model features without supervision, with …
We develop a state-of-the-art methodology to reliably train extremely wide and sparse autoencoders with very few dead latents on the activations of any language model. We …
We will first describe feedforward neural networks and the backpropagation algorithm for supervised learning. Then, we show how this is used to construct an autoencoder, which is an …
The Geometry of Concepts: Sparse Autoencoder Feature Structure …
Oct 10, 2024 · We find that this concept universe has interesting structure at three levels: 1) The “atomic” small-scale structure contains “crystals” whose faces are parallelograms or …
Sparse AutoEncoder: from Superposition to interpretable features
Feb 1, 2025 · In this blog post, we take one step further: let’s try to disentangle some fsuperposed features. I will introduce a methodology called Sparse Autoencoder to decompose complex …