Back to projectsResearch

nanocircuits

nanocircuits is a small mechanistic-interpretability lab built around known-answer circuits. Its useful claim is not a pretty diagram; it is whether a circuit finder beats the strongest cheap structural baseline and whether the oracle itself is faithful.

  • Known-answer circuit recovery with AUROC against ground truth
  • Strong structural baselines included
  • Oracle faithfulness measured rather than assumed
  • Adversarial review issues reflected in the public writeup
Mech interpGround truthBaselines