Model Compatibility¶
Reverse Attention works with any HuggingFace model that supports output_attentions=True. This guide covers tested models and known considerations.
Important: Attention Implementation
Modern transformers (4.36+) use SDPA by default, which does not support outputting attention weights. You must explicitly set attn_implementation="eager" when loading models:
Requirements¶
A compatible model must:
- Be a HuggingFace
PreTrainedModel - Support
output_attentions=Truein forward pass - Return attention tensors in the expected format
Tested Models¶
Fully Tested¶
| Model Family | Example | Notes |
|---|---|---|
| Qwen2 | Qwen/Qwen2-0.5B |
Recommended, well-tested |
| Llama 2 | meta-llama/Llama-2-7b |
Works well |
| GPT-2 | gpt2, gpt2-medium |
Works well |
| Mistral | mistralai/Mistral-7B |
Works well |
Should Work (Standard Architecture)¶
| Model Family | Example |
|---|---|
| Llama 3 | meta-llama/Llama-3-8b |
| Phi | microsoft/phi-2 |
| Gemma | google/gemma-7b |
| OPT | facebook/opt-1.3b |
| BLOOM | bigscience/bloom-560m |
| Falcon | tiiuae/falcon-7b |
Using Different Models¶
Basic Usage¶
from transformers import AutoModelForCausalLM, AutoTokenizer
from reverse_attention import ReverseAttentionTracer
# Any HuggingFace model
model = AutoModelForCausalLM.from_pretrained(
"gpt2",
attn_implementation="eager",
)
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tracer = ReverseAttentionTracer(model, tokenizer)
result = tracer.trace_text("Hello world")
Models with trust_remote_code¶
Some models require trusting remote code:
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2-0.5B",
trust_remote_code=True,
attn_implementation="eager",
)
tokenizer = AutoTokenizer.from_pretrained(
"Qwen/Qwen2-0.5B",
trust_remote_code=True,
)
Large Models¶
For larger models, use memory-efficient loading:
import torch
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b",
torch_dtype=torch.float16, # Half precision
device_map="auto", # Automatic device placement
attn_implementation="eager",
)
Attention Output Format¶
Expected Format¶
The model's attention output should be a tuple of tensors, one per layer:
Most HuggingFace models follow this convention.
Head Aggregation¶
Since different models have different numbers of heads, we aggregate:
# Mean across heads (default)
tracer.trace_text(text, agg_heads="mean")
# Max across heads
tracer.trace_text(text, agg_heads="max")
Troubleshooting¶
Model Doesn't Output Attention¶
Error: Model does not output attention weights
Solutions:
-
Check if model supports attention output:
-
Some models need explicit configuration:
Flash Attention Incompatibility¶
Flash Attention 2 doesn't return attention weights. Disable it:
Out of Memory¶
Large models may exceed GPU memory:
# Use float16
model = AutoModelForCausalLM.from_pretrained(
"model-name",
torch_dtype=torch.float16,
device_map="auto",
)
# Or use CPU (slower)
model = AutoModelForCausalLM.from_pretrained(
"model-name",
device_map="cpu",
)
Wrong BOS Token¶
Some models use different BOS tokens:
# Check tokenizer's BOS
print(tokenizer.bos_token, tokenizer.bos_token_id)
# Override if needed
result = tracer.trace_text(text, bos_token_id=1)
Model-Specific Notes¶
Qwen2¶
Recommended model family. Works well:
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2-0.5B",
trust_remote_code=True,
attn_implementation="eager",
)
GPT-2¶
Classic model, excellent for testing:
Note: GPT-2 doesn't add a BOS token by default. Paths may run to position 0.
Llama Models¶
May require gated access:
# After accepting license on HuggingFace
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b",
token="your-hf-token",
attn_implementation="eager",
)
Encoder-Decoder Models¶
Currently not supported. Reverse Attention is designed for decoder-only (causal) models.
Unsupported: T5, BART, mBART
Custom Models¶
For custom models, ensure they:
- Inherit from
PreTrainedModel - Accept
output_attentionsparameter - Return attention in standard format
class CustomModel(PreTrainedModel):
def forward(self, input_ids, output_attentions=False, **kwargs):
# ... your forward pass ...
if output_attentions:
return ModelOutput(
logits=logits,
attentions=tuple(layer_attentions),
)
return ModelOutput(logits=logits)
Verifying Compatibility¶
Quick test to verify a model works:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"your-model",
attn_implementation="eager",
)
tokenizer = AutoTokenizer.from_pretrained("your-model")
inputs = tokenizer("Test input", return_tensors="pt")
outputs = model(**inputs, output_attentions=True)
# Check attention output
if outputs.attentions is None:
print("Model doesn't output attention!")
else:
print(f"Layers: {len(outputs.attentions)}")
print(f"Shape: {outputs.attentions[0].shape}")
# Should be [batch, heads, seq_len, seq_len]