Model Compatibility¶

Reverse Attention works with any HuggingFace model that supports output_attentions=True. This guide covers tested models and known considerations.

Important: Attention Implementation

Modern transformers (4.36+) use SDPA by default, which does not support outputting attention weights. You must explicitly set attn_implementation="eager" when loading models:

model = AutoModelForCausalLM.from_pretrained(
    "model-name",
    attn_implementation="eager",
)

Requirements¶

A compatible model must:

Be a HuggingFace PreTrainedModel
Support output_attentions=True in forward pass
Return attention tensors in the expected format

Tested Models¶

Fully Tested¶

Model Family	Example	Notes
Qwen2	`Qwen/Qwen2-0.5B`	Recommended, well-tested
Llama 2	`meta-llama/Llama-2-7b`	Works well
GPT-2	`gpt2`, `gpt2-medium`	Works well
Mistral	`mistralai/Mistral-7B`	Works well

Should Work (Standard Architecture)¶

Model Family	Example
Llama 3	`meta-llama/Llama-3-8b`
Phi	`microsoft/phi-2`
Gemma	`google/gemma-7b`
OPT	`facebook/opt-1.3b`
BLOOM	`bigscience/bloom-560m`
Falcon	`tiiuae/falcon-7b`

Using Different Models¶

Basic Usage¶

from transformers import AutoModelForCausalLM, AutoTokenizer
from reverse_attention import ReverseAttentionTracer

# Any HuggingFace model
model = AutoModelForCausalLM.from_pretrained(
    "gpt2",
    attn_implementation="eager",
)
tokenizer = AutoTokenizer.from_pretrained("gpt2")

tracer = ReverseAttentionTracer(model, tokenizer)
result = tracer.trace_text("Hello world")

Models with `trust_remote_code`¶

Some models require trusting remote code:

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2-0.5B",
    trust_remote_code=True,
    attn_implementation="eager",
)
tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen2-0.5B",
    trust_remote_code=True,
)

Large Models¶

For larger models, use memory-efficient loading:

import torch

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b",
    torch_dtype=torch.float16,  # Half precision
    device_map="auto",          # Automatic device placement
    attn_implementation="eager",
)

Attention Output Format¶

Expected Format¶

The model's attention output should be a tuple of tensors, one per layer:

attentions = (
    layer_0_attn,  # [batch, heads, seq_len, seq_len]
    layer_1_attn,
    ...
    layer_n_attn,
)

Most HuggingFace models follow this convention.

Head Aggregation¶

Since different models have different numbers of heads, we aggregate:

# Mean across heads (default)
tracer.trace_text(text, agg_heads="mean")

# Max across heads
tracer.trace_text(text, agg_heads="max")

Troubleshooting¶

Model Doesn't Output Attention¶

Error: Model does not output attention weights

Solutions:

Check if model supports attention output:

outputs = model(input_ids, output_attentions=True)
print(outputs.attentions is not None)  # Should be True

Some models need explicit configuration:

model = AutoModelForCausalLM.from_pretrained(
    "model-name",
    attn_implementation="eager",  # Not "flash_attention_2"
)

Flash Attention Incompatibility¶

Flash Attention 2 doesn't return attention weights. Disable it:

model = AutoModelForCausalLM.from_pretrained(
    "model-name",
    attn_implementation="eager",
)

Out of Memory¶

Large models may exceed GPU memory:

# Use float16
model = AutoModelForCausalLM.from_pretrained(
    "model-name",
    torch_dtype=torch.float16,
    device_map="auto",
)

# Or use CPU (slower)
model = AutoModelForCausalLM.from_pretrained(
    "model-name",
    device_map="cpu",
)

Wrong BOS Token¶

Some models use different BOS tokens:

# Check tokenizer's BOS
print(tokenizer.bos_token, tokenizer.bos_token_id)

# Override if needed
result = tracer.trace_text(text, bos_token_id=1)

Model-Specific Notes¶

Qwen2¶

Recommended model family. Works well:

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2-0.5B",
    trust_remote_code=True,
    attn_implementation="eager",
)

GPT-2¶

Classic model, excellent for testing:

model = AutoModelForCausalLM.from_pretrained(
    "gpt2",
    attn_implementation="eager",
)

Note: GPT-2 doesn't add a BOS token by default. Paths may run to position 0.

Llama Models¶

May require gated access:

# After accepting license on HuggingFace
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b",
    token="your-hf-token",
    attn_implementation="eager",
)

Encoder-Decoder Models¶

Currently not supported. Reverse Attention is designed for decoder-only (causal) models.

Unsupported: T5, BART, mBART

Custom Models¶

For custom models, ensure they:

Inherit from PreTrainedModel
Accept output_attentions parameter
Return attention in standard format

class CustomModel(PreTrainedModel):
    def forward(self, input_ids, output_attentions=False, **kwargs):
        # ... your forward pass ...

        if output_attentions:
            return ModelOutput(
                logits=logits,
                attentions=tuple(layer_attentions),
            )
        return ModelOutput(logits=logits)

Verifying Compatibility¶

Quick test to verify a model works:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "your-model",
    attn_implementation="eager",
)
tokenizer = AutoTokenizer.from_pretrained("your-model")

inputs = tokenizer("Test input", return_tensors="pt")
outputs = model(**inputs, output_attentions=True)

# Check attention output
if outputs.attentions is None:
    print("Model doesn't output attention!")
else:
    print(f"Layers: {len(outputs.attentions)}")
    print(f"Shape: {outputs.attentions[0].shape}")
    # Should be [batch, heads, seq_len, seq_len]