Scaffold Arena
Scaffold Arena measures how orchestration code around an LLM changes outcomes. The project treats scaffold design as the experimental object rather than treating the model as the whole system.
- Benchmarking harness for scaffold choices
- Quality, reliability, cost, and latency measurements
- Reproducible task suites
- Evidence for orchestration-level failure modes
ScaffoldsEvaluationBenchmarks