Installation¶
Requirements¶
- Python 3.9 or higher
- PyTorch 2.0 or higher
- A HuggingFace transformer model
Install from PyPI¶
The simplest way to install:
Install from Source¶
For development or the latest features:
Optional Dependencies¶
Development¶
For running tests and contributing:
This installs pytest and pytest-cov.
Documentation¶
For building the documentation locally:
This installs MkDocs, the Material theme, and mkdocstrings.
Verify Installation¶
GPU Support¶
The package automatically uses CUDA if available. To verify GPU support:
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"MPS available: {torch.backends.mps.is_available()}")
The tracer will automatically select the best available device:
- CUDA (NVIDIA GPU)
- MPS (Apple Silicon)
- CPU (fallback)
You can also specify a device explicitly:
Troubleshooting¶
SDPA Attention Error¶
If you see 'sdpa' attention does not support 'output_attentions=True':
# Fix: Use eager attention implementation
model = AutoModelForCausalLM.from_pretrained(
"model-name",
attn_implementation="eager",
)
Modern transformers (4.36+) default to SDPA which doesn't support attention output. Always use attn_implementation="eager".
Out of Memory¶
If you encounter OOM errors:
- Use a smaller model (e.g.,
Qwen/Qwen2-0.5Binstead of larger variants) - Reduce
top_beamandtop_kparameters - Use shorter input sequences
- Enable half-precision: load model with
torch_dtype=torch.float16
Model Not Outputting Attention¶
Ensure your model supports output_attentions=True. Most HuggingFace models do, but some custom models may not. See Model Compatibility for details.