Visualization Guide¶
The interactive Sankey diagram is the primary way to explore reverse attention traces. This guide explains how to interpret and interact with the visualization.
What is a Sankey Diagram?¶
A Sankey diagram shows flow between nodes. In our context:
- Nodes = Token positions
- Links = Attention flow between positions
- Link width = Attention strength (weighted across beams)
Unlike an NxN attention heatmap, a Sankey diagram shows only the important paths—the ones discovered by beam search.
Anatomy of the Visualization¶
┌──────────────────────────────────────────────────────┐
│ [Beam Filter ▼] [Info Panel] │
│ │
│ ┌───┐ ┌───┐ ┌───┐ ┌───┐ │
│ │The│ ──── │fox│ ──── │over│ ──── │dog│ │
│ └───┘ └───┘ └───┘ └───┘ │
│ │ │ ↑ │
│ └──────────┴──────────────────────┘ │
│ │
│ Position: 0 3 6 9 │
└──────────────────────────────────────────────────────┘
Nodes¶
Each node represents a token at a specific position:
- Color: Viridis gradient by position (early = purple, late = yellow)
- Size: Proportional to total attention flowing through
- Label: The token text (decoded from token IDs)
Links¶
Links show attention connections between tokens:
- Width: Proportional to attention weight (aggregated across beams)
- Color: Matches the beam that uses this link
- Curvature: Helps distinguish crossing paths
Beam Colors¶
Each beam has a distinct color, making it easy to trace individual paths through the diagram.
Interacting with the Visualization¶
Zoom and Pan¶
- Scroll: Zoom in/out
- Drag: Pan the view
- Double-click: Reset zoom
Click to Highlight¶
Click on any node to:
- Highlight all links connected to that node
- Dim unrelated links
- Show node details in the info panel
Click on any link to:
- Highlight the source and target nodes
- Show attention weight and beam indices
Click on empty space to clear the highlight.
Beam Filter¶
Use the dropdown to filter by specific beams:
- All Beams: Show all discovered paths
- Beam 1, 2, ...: Show only paths from that beam
This helps isolate individual attention chains when the diagram is crowded.
Info Panel¶
The info panel shows details for the selected element:
For nodes:
For links:
Interpreting Patterns¶
Strong Single Path¶
A thick, dominant path suggests clear, focused attention. The model is strongly influenced by a specific token sequence.
Fan-Out Pattern¶
Multiple tokens at similar positions receive attention. The model is aggregating information from several sources.
Fan-In Pattern¶
Several tokens route through a single intermediate token. This token acts as an "attention hub."
Skip Connection¶
Direct long-range attention alongside shorter paths. The model maintains both local and global context.
Common Questions¶
Why are some links very thin?¶
Thin links have lower attention weights. They appear because:
- They're part of secondary beams (not the top beam)
- The softmax weighting reduces their contribution
Use beam filtering to focus on specific paths.
Why don't I see all tokens?¶
Only tokens that appear in discovered paths are shown. Tokens with very low attention aren't included because:
- They didn't make the
top_kcutoff at any step - Paths through them were pruned
Increase top_k and top_beam to see more.
Why do paths overlap?¶
Multiple beams may share common subsequences. This is expected—it shows where different attention paths converge.
The diagram is cluttered!¶
Try:
- Filter by beam: Focus on one path at a time
- Reduce parameters: Lower
top_beamortop_k - Zoom in: Focus on specific regions
- Click to highlight: Isolate connected elements
Saving and Sharing¶
The visualization is a self-contained HTML file. You can:
- Share the file: Send
output/index.htmlto anyone - Embed in presentations: Open in browser and screenshot
- Print: Use browser's print function (PDF output)
All data is embedded in the HTML—no server needed.
Next Steps¶
- Quick Start - Generate your first visualization
- Parameter Tuning - Control what gets shown
- Basic Tracing Example - Annotated walkthrough