Algebraic Interpretability and Context Integrity
A substrate-first framework for understanding why some interpretability methods work locally,
why they fail globally, and why semantic manifolds are not required.
Motivation
The preceding document,
Twelve Structural Falsifications of the Manifold Hypothesis,
establishes that strong semantic manifold assumptions are structurally incompatible
with transformer inference.
A natural question follows:
If smooth semantic manifolds are false, why do linear probes, sparse autoencoders,
steering vectors, and related methods work at all?
This document provides a direct answer.
Context Is a Substrate, Not a Verb
Context is not something a model “adds,” “infers,” or “moves through.”
Context is the total boundary condition under which computation is evaluated.
Context includes:
- Discrete tokenization
- Positional indexing
- Finite precision arithmetic
- Normalization and saturation
- Aliasing, collapse, and discontinuity
Semantic manifold narratives implicitly exclude these features.
Transformer computation does not.
Why Semantic Manifolds Fail
Strong semantic manifolds require:
- Continuity
- Stable coordinates
- Smooth transport under perturbation
As shown in
twelve.html, transformer inference violates these requirements
structurally and operationally.
Context can be fractured, aliased, saturated, or undefined.
Semantic manifolds cannot.
Algebraic Interpretability
Transformer computation is best understood as algebra operating over a context substrate.
More precisely:
- Computation lives in algebra (linear maps, composition, invariants)
- Context supplies the boundary conditions
- Geometry is optional and emergent
- Meaning is what is rendered from the context by the observer
Algebra does not require smooth space.
It requires only that certain operations commute, compose, or remain invariant
within a restricted regime.
Why Probes, SAEs, and Steering Sometimes Work
Linear probes, sparse autoencoders, and steering methods succeed when:
- A stable algebraic invariant exists under the sampled context regime
- Computation respects certain symmetries (token classes, routing invariants, normalized subspaces)
- The induced geometry is local and fragile, not global
A probe does not discover a “concept.”
It discovers a hyperplane that remains approximately invariant bunder a restricted family of contexts.
That is algebra, not semantics.
Context Integrity
Context Integrity is the study of which discontinuities are:
- Desirable
- Benign
- Destabilizing
- Catastrophic
Rather than denying fractures, Context Integrity treats them as first-class objects to be mapped, tested, and monitored.
Stress-prompt effects, for example, are not semantic jumps.
They are context substitutions that move computation into a different algebraic regime.
What This Framework Does and Does Not Claim
Algebraic Interpretability does not claim:
- That meaning is encoded algebraically
- That transformers “understand”
- That geometry is always useless
It claims only that:
- Interpretability succeeds by exploiting algebraic regularities
- Those regularities are context-dependent
- Global semantic geometry is not required and not supported
Positioning
This framework is not a replacement for existing interpretability tools.
It is a correction to their explanatory stories.
- Methods that work can be retained.
- Methods that fail can be diagnosed.
- No one needs to be “fired.”
Only rebadged.
The falsification groundwork is laid in
twelve.html.
This document explains what survives.