Back to projectsResearch

nanofeatures

nanofeatures carries the nanocircuits discipline to SAE features on real models. The result is a boundary: cheap scores can match attribution on single-position tasks, while distributed circuits need a causal and position-resolved readout.

  • Gemma-2-2B and GPT-2 SAE-feature studies
  • Cheap baseline ladder reported directly
  • Distributed-circuit boundary with bootstrap intervals
  • Gradient-free causal position-resolved control included
SAE featuresGemmaAttribution