Back to projectsResearch

nanoassembly

nanoassembly is the third piece of the nano interpretability sequence. It tests whether the per-element attribution boundary composes into multi-layer SAE feature circuits, then exposes a calibration API that recommends cheap, gradient, or exact scoring per layer.

  • Calibrates feature-circuit scoring on GPT-2-small and Gemma-2-2B
  • Shows gradient scoring is a near-exact proxy for per-feature ablation in most cells
  • Reports paired-bootstrap intervals across 75 model-task-layer cells
  • Includes calibrate_circuit() for per-layer method choice
SAE circuitsAttributionCalibration