Visualization Guide¶

The interactive Sankey diagram is the primary way to explore reverse attention traces. This guide explains how to interpret and interact with the visualization.

What is a Sankey Diagram?¶

A Sankey diagram shows flow between nodes. In our context:

Nodes = Token positions
Links = Attention flow between positions
Link width = Attention strength (weighted across beams)

Unlike an NxN attention heatmap, a Sankey diagram shows only the important paths—the ones discovered by beam search.

Anatomy of the Visualization¶

┌──────────────────────────────────────────────────────┐
│  [Beam Filter ▼]                        [Info Panel] │
│                                                      │
│     ┌───┐      ┌───┐      ┌───┐      ┌───┐         │
│     │The│ ──── │fox│ ──── │over│ ──── │dog│         │
│     └───┘      └───┘      └───┘      └───┘         │
│       │          │                      ↑           │
│       └──────────┴──────────────────────┘           │
│                                                      │
│  Position: 0        3          6         9          │
└──────────────────────────────────────────────────────┘

Nodes¶

Each node represents a token at a specific position:

Color: Viridis gradient by position (early = purple, late = yellow)
Size: Proportional to total attention flowing through
Label: The token text (decoded from token IDs)

Links¶

Links show attention connections between tokens:

Width: Proportional to attention weight (aggregated across beams)
Color: Matches the beam that uses this link
Curvature: Helps distinguish crossing paths

Beam Colors¶

Each beam has a distinct color, making it easy to trace individual paths through the diagram.

Interacting with the Visualization¶

Zoom and Pan¶

Scroll: Zoom in/out
Drag: Pan the view
Double-click: Reset zoom

Click to Highlight¶

Click on any node to:

Highlight all links connected to that node
Dim unrelated links
Show node details in the info panel

Click on any link to:

Highlight the source and target nodes
Show attention weight and beam indices

Click on empty space to clear the highlight.

Beam Filter¶

Use the dropdown to filter by specific beams:

All Beams: Show all discovered paths
Beam 1, 2, ...: Show only paths from that beam

This helps isolate individual attention chains when the diagram is crowded.

Info Panel¶

The info panel shows details for the selected element:

For nodes:

Token: "fox"
Position: 3
Layer: 23

For links:

Source: "fox" (pos 3)
Target: "over" (pos 6)
Attention: 0.342
Beams: [1, 3]

Interpreting Patterns¶

Strong Single Path¶

[A] ════════ [B] ════════ [C] ════════ [Target]

A thick, dominant path suggests clear, focused attention. The model is strongly influenced by a specific token sequence.

Fan-Out Pattern¶

      ┌──── [B] ────┐
[A] ──┼──── [C] ────┼── [Target]
      └──── [D] ────┘

Multiple tokens at similar positions receive attention. The model is aggregating information from several sources.

Fan-In Pattern¶

[A] ────┐
[B] ────┼── [X] ─── [Target]
[C] ────┘

Several tokens route through a single intermediate token. This token acts as an "attention hub."

Skip Connection¶

[A] ─────────────────────── [Target]
         ↓                      ↑
       [B] ───── [C] ──────────┘

Direct long-range attention alongside shorter paths. The model maintains both local and global context.

Common Questions¶

Why are some links very thin?¶

Thin links have lower attention weights. They appear because:

They're part of secondary beams (not the top beam)
The softmax weighting reduces their contribution

Use beam filtering to focus on specific paths.

Why don't I see all tokens?¶

Only tokens that appear in discovered paths are shown. Tokens with very low attention aren't included because:

They didn't make the top_k cutoff at any step
Paths through them were pruned

Increase top_k and top_beam to see more.

Why do paths overlap?¶

Multiple beams may share common subsequences. This is expected—it shows where different attention paths converge.

The diagram is cluttered!¶

Try:

Filter by beam: Focus on one path at a time
Reduce parameters: Lower top_beam or top_k
Zoom in: Focus on specific regions
Click to highlight: Isolate connected elements

The visualization is a self-contained HTML file. You can:

Share the file: Send output/index.html to anyone
Embed in presentations: Open in browser and screenshot
Print: Use browser's print function (PDF output)

All data is embedded in the HTML—no server needed.

Next Steps¶

Quick Start - Generate your first visualization
Parameter Tuning - Control what gets shown
Basic Tracing Example - Annotated walkthrough