| Token Selection |
\( \text{argmax}(P) \) or \( x \sim P(x) \) |
Discrete Point (\( \mathbb{Z} \)) |
1 |
No → YES |
Fracture 1: Quotient Break.
The non-differentiable snap from distribution to integer.
Global topology is severed; the manifold restarts here. Context is solidified into fact.
|
| Embedding Lookup |
\( E[t] \in \mathbb{R}^d \) |
Sparse Lattice |
2 |
YES |
Fracture 2: Table Folding.
Integer-to-vector map. Distinct points remain distinct, but if \( V \gg d_{model} \),
the table must self-intersect (pigeonhole pressure), creating “wormholes” between unrelated tokens.
|
| Positional Injection |
\( x + P_{pos} \) |
Rotated/Jittered Lattice |
3 |
YES |
Fracture 3: Phase Wrap.
Injects high-frequency geometric structure.
Prevents symmetry collapse, but introduces seams (wrap boundaries) that do not map cleanly to semantic distance.
|
| Residual Entry |
\( x_{identity} \) |
Euclidean Vector |
– |
YES |
The Golden Thread.
The only path that preserves the token’s identity location.
Subsequent processing produces an update; the identity path carries persistence.
|
| Layer/RMS Norm |
\( \frac{x - \mu}{\sigma} \) |
Hypersphere (\( S^{d-1} \)) |
9 |
Yes → NO |
Fracture 9: Geometry Rewriting.
Euclidean distance is destroyed; magnitude is erased; dimensions are coupled by global variance.
The metric becomes token-local energy geometry, not persistent global geometry.
|
| Q/K/V Projections |
\( x W_Q, x W_K, x W_V \) |
Rotated Subspaces |
8 |
NO |
Fracture 8: Finite Precision Quantization.
Linear maps applied to an already re-written geometry; reduced precision (FP16/Int8) makes traversal lattice-like.
|
| Attention Scores |
\( \frac{Q K^T}{\sqrt{d_k}} \) |
Scalar Field (Heatmap) |
10 |
NO |
The Melting.
Vectors collapse into scalar similarity scores.
Fracture 10: Undefined Numeric States.
If not stabilized, values can become Inf/NaN, creating representational holes where the manifold ceases to exist.
|
| Causal Masking |
\( M_{ij} = -\infty \) if \( j > i \) |
Lower Triangular Cliff |
– |
NO |
Topological Cut.
Enforces the arrow of time as a hard boundary condition; future states are functionally removed from reachable space.
|
| Softmax |
\( \frac{e^{x_i}}{\sum e^{x_j}} \) |
Simplex (\( \Delta^{N-1} \)) |
4 |
NO |
Fracture 4: Saturation.
Forces weights onto a sum-to-1 surface. Dominance collapses diversity; gradients vanish for non-dominant options.
Geometry becomes a “flat plain” with spikes.
|
| Value Aggregation |
\( \sum A \cdot V \) |
Convex Hull (Blob) |
6 |
NO |
Fracture 6: KV-Cache Aliasing.
Distinct historical token identities are mixed by weighted averaging.
Output is constrained to the convex hull of V; unique trajectory information is destroyed by smoothing.
|
| Output Projection |
\( H_{attn} W_O \) |
Rotated Blob |
– |
NO |
Linear remap back to \( d_{model} \).
It cannot recover distinctness lost during aggregation; it only reorients the already-mixed result.
|
| Residual Add 1 |
\( x_{res} = x_{old} + x_{attn} \) |
Composite Vector |
5 |
Mixed → YES |
Fracture 5: Dominance Shift.
Valid identity stream is added to invalid attention output.
Identity preserves location; attention is treated as an update vector, not a replacement coordinate system.
|
| Layer/RMS Norm |
\( \text{Norm}(x_{res}) \) |
Hypersphere Surface |
9 |
Yes → NO |
Fracture 9: Geometry Rewriting.
Magnitude is destroyed again; distances are recomputed in a new normalized geometry.
|
| Up-Projection |
\( x W_{up} \) |
High-Dim (\( \mathbb{R}^{4d} \)) |
– |
NO |
Unfolding.
Expands into a higher-dimensional workspace to enable later cutting/folding effects.
|
| Activation |
\( \sigma(x) \) (ReLU/GELU) |
Folded Space / Cone |
7 |
NO |
Fracture 7: Activation Saturation.
ReLU deletes half-space; GELU warps it.
Creates edges needed for decision structure, but violates smooth neighborhood transport.
|
| Down-Projection |
\( x W_{down} \) |
Collapsed Projection |
– |
NO |
Refolding.
Compresses back to \( d_{model} \); “scars” from activation remain as encoded features.
|
| Residual Add 2 |
\( x_{res} = x_{old} + x_{ffn} \) |
Composite Vector |
5 |
Mixed → YES |
Fracture 5: Dominance Shift.
Identity path stabilizes coordinate persistence; FFN contributes features without becoming the sole locator.
|
| Final Norm |
\( \text{Norm}(x_{final}) \) |
Hypersphere Surface |
9 |
Yes → NO |
Fracture 9: Final Skinning.
Distances are finalized by angle (cosine similarity), not Euclidean distance.
|
| Unembed / Logits |
\( x W_{vocab} \) |
Vocab Space (\( \mathbb{R}^V \)) |
11 |
NO |
Fracture 11: Rank Collapse.
\( d_{model} \rightarrow V \) with \( V \gg d_{model} \): output occupies a low-rank sheet with phantom volume.
|
| Final Softmax |
\( \text{Softmax}(x) \) |
Simplex |
4 |
NO |
The Haze.
Final smoothing into a probability distribution; the next sampler fractures it back into a discrete outcome.
|
| Stress Testing |
\( f(x+\epsilon) \) |
Discontinuous Jumps |
12 |
BROKEN |
Fracture 12: Stress-Prompt Discontinuities.
Operational check: tiny input changes should cause tiny output changes.
Falsification: small prompt edits can trigger argmax jumps into a different cluster.
|