Skip to content

Model Compatibility

Reverse Attention works with any HuggingFace model that supports output_attentions=True. This guide covers tested models and known considerations.

Important: Attention Implementation

Modern transformers (4.36+) use SDPA by default, which does not support outputting attention weights. You must explicitly set attn_implementation="eager" when loading models:

model = AutoModelForCausalLM.from_pretrained(
    "model-name",
    attn_implementation="eager",
)

Requirements

A compatible model must:

  1. Be a HuggingFace PreTrainedModel
  2. Support output_attentions=True in forward pass
  3. Return attention tensors in the expected format

Tested Models

Fully Tested

Model Family Example Notes
Qwen2 Qwen/Qwen2-0.5B Recommended, well-tested
Llama 2 meta-llama/Llama-2-7b Works well
GPT-2 gpt2, gpt2-medium Works well
Mistral mistralai/Mistral-7B Works well

Should Work (Standard Architecture)

Model Family Example
Llama 3 meta-llama/Llama-3-8b
Phi microsoft/phi-2
Gemma google/gemma-7b
OPT facebook/opt-1.3b
BLOOM bigscience/bloom-560m
Falcon tiiuae/falcon-7b

Using Different Models

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from reverse_attention import ReverseAttentionTracer

# Any HuggingFace model
model = AutoModelForCausalLM.from_pretrained(
    "gpt2",
    attn_implementation="eager",
)
tokenizer = AutoTokenizer.from_pretrained("gpt2")

tracer = ReverseAttentionTracer(model, tokenizer)
result = tracer.trace_text("Hello world")

Models with trust_remote_code

Some models require trusting remote code:

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2-0.5B",
    trust_remote_code=True,
    attn_implementation="eager",
)
tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen2-0.5B",
    trust_remote_code=True,
)

Large Models

For larger models, use memory-efficient loading:

import torch

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b",
    torch_dtype=torch.float16,  # Half precision
    device_map="auto",          # Automatic device placement
    attn_implementation="eager",
)

Attention Output Format

Expected Format

The model's attention output should be a tuple of tensors, one per layer:

attentions = (
    layer_0_attn,  # [batch, heads, seq_len, seq_len]
    layer_1_attn,
    ...
    layer_n_attn,
)

Most HuggingFace models follow this convention.

Head Aggregation

Since different models have different numbers of heads, we aggregate:

# Mean across heads (default)
tracer.trace_text(text, agg_heads="mean")

# Max across heads
tracer.trace_text(text, agg_heads="max")

Troubleshooting

Model Doesn't Output Attention

Error: Model does not output attention weights

Solutions:

  1. Check if model supports attention output:

    outputs = model(input_ids, output_attentions=True)
    print(outputs.attentions is not None)  # Should be True
    

  2. Some models need explicit configuration:

    model = AutoModelForCausalLM.from_pretrained(
        "model-name",
        attn_implementation="eager",  # Not "flash_attention_2"
    )
    

Flash Attention Incompatibility

Flash Attention 2 doesn't return attention weights. Disable it:

model = AutoModelForCausalLM.from_pretrained(
    "model-name",
    attn_implementation="eager",
)

Out of Memory

Large models may exceed GPU memory:

# Use float16
model = AutoModelForCausalLM.from_pretrained(
    "model-name",
    torch_dtype=torch.float16,
    device_map="auto",
)

# Or use CPU (slower)
model = AutoModelForCausalLM.from_pretrained(
    "model-name",
    device_map="cpu",
)

Wrong BOS Token

Some models use different BOS tokens:

# Check tokenizer's BOS
print(tokenizer.bos_token, tokenizer.bos_token_id)

# Override if needed
result = tracer.trace_text(text, bos_token_id=1)

Model-Specific Notes

Qwen2

Recommended model family. Works well:

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2-0.5B",
    trust_remote_code=True,
    attn_implementation="eager",
)

GPT-2

Classic model, excellent for testing:

model = AutoModelForCausalLM.from_pretrained(
    "gpt2",
    attn_implementation="eager",
)

Note: GPT-2 doesn't add a BOS token by default. Paths may run to position 0.

Llama Models

May require gated access:

# After accepting license on HuggingFace
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b",
    token="your-hf-token",
    attn_implementation="eager",
)

Encoder-Decoder Models

Currently not supported. Reverse Attention is designed for decoder-only (causal) models.

Unsupported: T5, BART, mBART

Custom Models

For custom models, ensure they:

  1. Inherit from PreTrainedModel
  2. Accept output_attentions parameter
  3. Return attention in standard format
class CustomModel(PreTrainedModel):
    def forward(self, input_ids, output_attentions=False, **kwargs):
        # ... your forward pass ...

        if output_attentions:
            return ModelOutput(
                logits=logits,
                attentions=tuple(layer_attentions),
            )
        return ModelOutput(logits=logits)

Verifying Compatibility

Quick test to verify a model works:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "your-model",
    attn_implementation="eager",
)
tokenizer = AutoTokenizer.from_pretrained("your-model")

inputs = tokenizer("Test input", return_tensors="pt")
outputs = model(**inputs, output_attentions=True)

# Check attention output
if outputs.attentions is None:
    print("Model doesn't output attention!")
else:
    print(f"Layers: {len(outputs.attentions)}")
    print(f"Shape: {outputs.attentions[0].shape}")
    # Should be [batch, heads, seq_len, seq_len]