A substrate-level operational falsification of strong semantic manifold assumptions used in interpretability, steering, alignment, and safety narratives.
This document targets a strong operational form of the Manifold Hypothesis as invoked in semantic space, concept vectors, smooth steering, and semantic distance claims.
Local linear operability is not disputed. Narrow operational regimes may admit approximate linearity, enabling probes, features, and steering to function transiently.
What is denied is the existence of a globally coherent manifold supporting stable coordinates, smooth transition maps, or predictable transport across the reachable state space of transformer inference. Disconnected linear islands do not constitute a manifold.
This work does not propose an alternative semantic ontology. Its purpose is operational and negative: to show that widely invoked semantic-geometric assumptions are structurally incompatible with transformer computation.
Once falsified, explanatory burden shifts to proponents of those frameworks to either weaken claims until manifold structure is no longer required, or demonstrate validity under explicit discontinuity, aliasing, and non-smoothness.
This work does not claim that transformers are uninterpretable, that probes never work, or that internal structure cannot be studied.
It claims only that such successes do not license global semantic-geometric interpretation.
A fracture is a structural or operational mechanism that violates MH-A, MH-B, or MH-C during real transformer inference.
| Idx | Fracture | Mechanism | Break Type | Manifold Property Violated | Status | Notes |
|---|---|---|---|---|---|---|
| 1 | Tokenization Quotient Break | Many-to-one non-invertible mapping | Topological | Global topology | Structural | Quotient singularities preclude manifold structure |
| 2 | Embedding Table Folding | Intersecting embeddings under training pressure | Geometric | Local injectivity | Structural | Self-intersections destroy coordinate uniqueness |
| 3 | Positional Phase Wrap | Periodic/rotary coordinate identification | Topological | Global charts | Structural | Phase seams enforce coordinate singularities |
| 4 | Attention Softmax Saturation | Exponentiation + normalization cliffs | Differential | Smooth transport | Structural | Degenerate response regimes |
| 5 | Residual Dominance Shift | Abrupt pathway switching | Differential | Tangent stability | Structural | Nearby states follow different compute paths |
| 6 | KV-Cache Aliasing | Distinct histories collapse to identical states | Topological | Trajectory injectivity | Structural | History cannot embed as a path |
| 7 | MLP Activation Saturation | Flat/clipped nonlinear regions | Differential | Local diffeomorphism | Structural | Neighborhood collapse |
| 8 | Finite Precision Quantization | FP arithmetic discretization | Numeric | Continuity | Structural | Lattice replaces geometry |
| 9 | Normalization Geometry Rewriting | LayerNorm/RMSNorm erase scale | Numeric/Geometric | Metric persistence | Structural | Distances recomputed, not preserved |
| 10 | Undefined Numeric States | NaN/Inf from overflow or instability | Topological | Totality | Operational | Hard representational holes |
| 11 | Logit Rank Collapse | Anisotropic vocabulary projection | Geometric | Dimensional regularity | Structural | Effective dimension varies |
| 12 | Stress-Prompt Discontinuities | Tiny prompt changes → large jumps | Empirical | Predictable response | Operational | Reachable in ordinary use |
The original eleven grouped normalization effects and undefined numeric states together. They are distinct mechanisms.
Normalization rewrites geometry deterministically. NaN/Inf break totality entirely. They violate different manifold properties and have different implications.
There is such a deep and broad dogma of semantics. It is a cascade of reification into the methods and approaches.
| Method | Domain | Assumes MH | Fracture Index | Conflict | Risk if MH False | Notes |
|---|---|---|---|---|---|---|
| Sparse Autoencoders | McInt | Yes | 2,7,8,9,11 | Assumes smooth separable feature space | Feature drift | Locally useful only |
| Steering Vectors | McInt | Yes | 4,5,7,12 | Assumes linear semantic control | Brittle behavior | Context sensitive |
| Representation Similarity | McInt | Yes | 2,8,11 | Metric continuity assumed | False similarity | Correlational |
| Belief Probes | Align | Yes | 4,5,6,7 | Stable semantic coordinates assumed | False confidence | Non-persistent axes |
| RLHF | Align | Implicit | 4,5,9,12 | Assumes smooth reward landscape | Reward hacking | Surface control |
| Constitutional AI | Safety | Implicit | 4,5,7,12 | Assumes continuous steerability | Sudden failure | Governance veneer |
| Logit Lens | McInt | No | – | Syntactic readout | None | Pre-semantic |
| Causal Tracing | McInt | No | – | Perturbational testing | None | Model-agnostic |
| Red Teaming | Safety | No | 12 | Direct fracture probing | None | Empirical ground truth |
The following table evaluates commonly used transformer and interpretability terms by their scope of validity under the twelve structural falsifications.
Classifications are operational, not ontological. They describe what a term can safely be used to claim — and where it silently overclaims.
| Term | Classification | Valid Use | Overclaim Risk | Notes |
|---|---|---|---|---|
| Token | Structural | Discrete algebraic primitive | None | Foundation of computation; non-semantic by construction |
| Attention | Structural | Routing and weighting mechanism | Semantic attribution | Operationally precise; semantics often projected post hoc |
| Residual Stream | Structural | Additive state composition | Interpretation as continuous trajectory | Additivity does not imply geometric smoothness |
| Embedding | Operational | Lookup-based representational handle | Semantic distance, neighborhood meaning | Folding and normalization undermine global geometry |
| Feature | Context-Bound | Repeatable activation motif in restricted regimes | Global semantic primitive | Feature identity drifts across context and scale |
| Sparse Feature | Context-Bound | Local basis element under fixed conditions | Monosemantic interpretation | Useful diagnostically; unstable under perturbation |
| Latent Space | Context-Bound | Visualization and local linear analysis | Global geometry, smooth traversal | Fails under normalization, aliasing, rank collapse |
| Representation | Operational | Intermediate computational state | Semantic encoding claim | Representation ≠ meaning storage |
| Semantic Space | Category Error | None as internal substrate | Meaning-as-geometry projection | Observer ontology, not model structure |
| Concept Vector | Category Error | None beyond heuristic steering | Stable semantic axis assumption | Violates coordinate persistence |
| Concept Neuron | Narrative | Pedagogical shorthand | Unit-level semantic attribution | Fails under distribution shift |
| Belief | Narrative | External behavioral description | Internal state attribution | Useful for UX, not mechanics |
| Knowledge Storage | Narrative | Informal description of behavior | Memory localization claims | Computation is reconstructive, not archival |
| Understanding | Narrative | Human-facing evaluation | Internal competence inference | Non-operational internally |
| Steering | Context-Bound | Short-horizon bias injection | Global control guarantee | Sharp regime edges |
| Linear Probe | Operational | Telemetry and correlation detection | Causal or semantic inference | Effective without manifold assumptions |
| SAE Feature | Context-Bound | Local coordinate extraction | Semantic atom claim | Feature identity not invariant |
| Mechanistic Circuit | Context-Bound | Reusable execution fragment | Global module interpretation | Regime-dependent |
| World Model | Narrative | Behavioral abstraction | Internal simulation claim | Observer convenience term |
| Alignment | Operational | Behavioral constraint satisfaction | Internal value shaping | Surface-level property |
| Safety | Operational | Failure avoidance and monitoring | Semantic guarantee inference | Engineering discipline, not ontology |
| Context | Structural | Total boundary condition of computation | Verb-like usage (“adding context”) | Substrate, not operation |
| Context Window | Structural | Finite dependency horizon | Memory equivalence claim | Does not imply persistence or recall |
| Generalization | Operational | Performance outside training samples | Semantic abstraction inference | Often regime-specific |