Back to projectsResearch

nanoassembly

nanoassembly is the third piece of the nano interpretability sequence. It tests whether the per-element attribution boundary composes into multi-layer SAE feature circuits, then exposes a calibration API that recommends cheap, gradient, or exact scoring per layer.

Calibrates feature-circuit scoring on GPT-2-small and Gemma-2-2B
Shows gradient scoring is a near-exact proxy for per-feature ablation in most cells
Reports paired-bootstrap intervals across 75 model-task-layer cells
Includes calibrate_circuit() for per-layer method choice

SAE circuitsAttributionCalibration

View on GitHub DOI

nanocircuits nanofeatures Trilogy writeup