Jason Lovell · Austin, Texas
I build AI agents.
I also advise execs on whether they should.
Over twenty years across strategy and delivery. The last few spent writing specs for trustworthy agents, building the tools that test them, and helping leaders work out what to deploy.
- 0Published standards
- 0+Open source repos
- 0+Years across UK and US
CurrentlyAAC/DVAAC release evidenceGhostTracenanoassembly
About
On the frontier of consumer tech for over twenty years. On AI since 2017.
I started in mobile before smartphones, when the consumer-tech frontier moved one new category at a time. The years since have been across most of them: mobile, IoT, wearables, tablets, VR and AR, connected home, smart audio. The arc tends to repeat. Loud enthusiasm, then a small number of teams quietly figuring out what is real and shipping it.
Since October 2019 I have been at PwC, working with leadership teams across sectors and most C-suite seats. UK first, then the US. Most engagements end up at the same question: which of this is real, and where does it create value worth shipping for? Before PwC I ran my own innovation consultancy. That is where AI caught me.
AlphaGo, OpenAI's DOTA work, and the AlphaGo Zero release pulled me in during 2017. I have read the field since. Attention Is All You Need, the GPT-2 staged release, AlphaFold, InstructGPT and RLHF before ChatGPT became a verb, the agent wave, the current scaling-versus-inference-time argument. I am not new here.
A few years ago I stopped just advising and started building. Agentic systems with Claude Code, Codex, and Cursor. Multi-agent orchestration. Replay engines, signed evidence objects, vulnerable benchmark corpora. I run every model worth running, fine-tune what warrants it, and test every coding agent, plugin, and MCP server on real work. Enough prototypes broken at 11pm to know what works in production and what only looks good in a demo.
What I care about now is the gap between what agentic AI can do and what is safe to ship. That gap is where standards, tooling, and judgement have to meet. It is also why most of what I publish is open source.
Where I am focused now
- StandardsAuthoring TBOM, SBA, TSA, and AAC. Supply-chain primitives for agentic systems.
- EngineeringReplay engines, signed evidence objects, vulnerable benchmark corpora.
- AdvisoryWorking with leadership teams on what agentic AI is actually ready for, and what is not.
- ReadingMech interp, test-time compute, world models, scalable oversight. See the threads section.
Open source
Assurance systems and small research labs.
Four public pillars first: signed release evidence, vulnerable fixtures, behavioural half-life experiments, and a compact interpretability sequence. The rest stays close by, but not at the same volume.
Featured
GhostTrace
Artifact-backed experiments on behavioural half-life under recursive self-distillation. Toy half-life is supported; local LLM runs are negative boundary evidence.
nanoassembly
Multi-layer SAE feature-circuit lab calibrating when gradient, exact ablation, or cheap scores are worth paying for.
Recent research labs
- ProofKernVerifier-first measurement harness for AI-generated GPU kernels. Local Metal wins survive MLX baselines, then collapse against torch.compile.Code
- nanoIMFrom-scratch interaction-model lab showing that a flattened chat transcript can erase timing, silence, overlap, and background events.Code
- nanoAWMTiny Agent World Model lab for learned consequence simulation in tool agents. Symbolic MiniOS evidence, not production safety validation.Code
Tooling
ProofPack
Tamper-evident receipt and audit-pack system for agent and tool runs. Ed25519 signatures, Merkle trees, and portable proof bundles.
jlov7/ProofPackBranchLab
Local-first replay engine for agent runs. Rewind a trace, branch into counterfactual timelines, and simulate policy changes before deployment.
jlov7/branchlabBaton Studio
Multi-agent control room with a shared world model, causal dependency graph, energy-aware scheduling, and fair baton arbitration.
jlov7/baton-studioMeta Memory Studio
Memory control plane for agents. Inspect, evolve, and govern what agents remember across sessions.
jlov7/meta-memory-studioAgent Director
Trace debugger for AI agents. Scrub through execution, replay specific steps, and diff runs side by side.
jlov7/agent-directorScaffold Arena
Workbench for measuring how scaffold choices change LLM quality, reliability, cost, latency, and failure profile.
jlov7/scaffold-arenaSkillScope
Observability for Anthropic Skills. See which skill was intended, which files were referenced, where approvals fired, and how tokens shifted.
jlov7/SkillScopeSentinel MCP
Governance layer for MCP servers with policies, budgets, and verifiable provenance.
jlov7/Sentinel-MCPAMDM
Toolkit for observing agentic systems, detecting emergent anomalies, and benchmarking detector behaviour against synthetic scenarios.
jlov7/AMDMSwitchboard
Sandbox for routing agent workloads across OpenAI, AWS Bedrock, and Google Vertex with approvals and audit trails.
jlov7/SwitchboardMCP Interop BakeOff
Research harness for stress-testing MCP servers across multiple agent runtimes with reproducible task suites.
jlov7/MCP-Interop-BakeOffLab15 R&D experiments
- ProofKernResearch
- nanoIMResearch
- nanoAWMResearch
- nanocircuitsResearch
- nanofeaturesResearch
- ABCPOrchestration
- Agent HQ GuardGovernance
- OverhearOpsApplied AI
- SKILLCHECKGovernance
- RLM-LensObservability
- RunwrightGovernance
- AgentGateGovernance
- TerraRisk AgentApplied AI
- VoiceForge AIApplied AI
- SkillBench-PDObservability
Threads I track
What I read when I am not building, and where it shows up in the work.
Most agentic-AI work needs help from outside the LLM stack. The list below is the current reading list, ordered roughly by how much it is shaping what I am shipping. Frontier-lab discourse first, applied threads after.
- Scalable oversight
Most agentic releases now exceed what one reviewer can verify by hand. AAC and DVAAC put the release decision into signed evidence and deterministic fixture checks.
AAC and DVAAC - Mechanistic interpretability
nanocircuits, nanofeatures, and nanoassembly ask the same practical question at three levels: when does a circuit claim beat the strongest cheap baseline?
nanoassembly - Subliminal learning
GhostTrace measures behavioural half-life under recursive self-distillation. The toy result is supported; the local LLM tier is negative boundary evidence, not a recursive LLM claim.
GhostTrace - Inference-time compute
Test-time scaling, process reward models, and search-at-inference change what a release decision means. AAC's verifier recomputes outcomes without an LLM in the loop.
AAC verifier - World models
nanoAWM is the small symbolic lab: learned consequence simulation in MiniOS, not production safety validation. Baton Studio is the multi-agent control-room side of the same interest.
nanoAWM - Verifier-first measurement
ProofKern keeps the negative result instead of sanding it down: four MLX-relative kernel wins become zero cross-framework wins against torch.compile on the same GPU.
ProofKern - Faithful chain-of-thought
If CoT stops being monitorable under heavy RL training, a lot of current oversight assumptions fall over. Tracking the faithfulness and process-supervision literature closely.
- Formal methods
SMT solvers (Z3) and deterministic verifiers as a route to release decisions an auditor can check offline. Where neuro-symbolic verification becomes practical for agent systems.
AAC verifier - Conformal prediction
Conformal calibration and selective classification for LLM outputs and abstention. Active interest, no published artifact yet.
- Knowledge graphs
KG-augmented retrieval and corpus intelligence inside scaffold engineering work.
Scaffold Arena - Geospatial AI
FEMA NRI, BigQuery Earth Engine, and satellite imagery for compliance-aware geospatial underwriting.
TerraRisk Agent
Standards
Four specs for the MCP and agent era.
Each spec is open source, has a DOI, and ships with reference tooling. The goal is the same in all four: make agent supply chains and release decisions something an auditor can verify.
- Standard2026
TBOM: Tool Bill of Materials
A provenance and integrity standard for the MCP ecosystem. Cryptographically signed manifests that bind MCP server releases to immutable tool metadata, supporting automated trust verification and reducing the surface for tool poisoning in AI agent supply chains.
- Standard2026
SBA: Skill Bundle Attestation
Deterministic bundle identity, content attestation format, and verification tooling for agent skill bundles. Minimal, reproducible, and supply-chain friendly. Adds trust at the skill layer.
- Standard2026
TSA: Tool Security Advisory
Machine-readable security advisories for MCP tools. Defines a JSON format, a feed index, and a trust model so registries, hosts, and gateways can automatically block, warn, or remediate vulnerable tools.
- Standard2026
AAC: Agent Assurance Case
Portable, signed, audit-grade evidence object for agentic AI release assurance. Binds inventory, detector coverage, findings, policy decisions, release conditions, compliance evidence, a deterministic verdict, and an Ed25519 signature. Verifiable offline.
Writing and talks
Specs, preprints, and the occasional talk.
Most of what I publish is open source. The releases below are reverse-chronological. Click any line for the artifact.
- 2026 · 06ReleaseGhostTrace: The Behavioral Half-Life· Repository
- 2026 · 06DOInanoassembly· Zenodo
- 2026 · 06DOInanofeatures· Zenodo
- 2026 · 06DOInanocircuits· Zenodo
- 2026 · 06ReleaseProofKern· Repository
- 2026 · 05ReleaseAgent Assurance Case v0.2-candidate.5· Spec and verifier
- 2026 · 05ReleaseDamn Vulnerable Agent Asset Corpus· Benchmark
- 2026 · 05DOInanoIM· Zenodo
- 2026 · 05DOInanoAWM· Zenodo
- 2026 · 04PreprintTool Security Advisory (TSA)· TechRxiv
- 2026 · 02DOITSA Specification (DOI)· Zenodo
- 2026 · 02DOISkill Bundle Attestation (SBA) Specification· Zenodo
- 2025 · 12DOITool Bill of Materials (TBOM) Specification· Zenodo
Ongoing notes
Most weeks I post shorter takes on LinkedIn and X. Anything worth keeping turns into a spec, a repo, or an entry above.
ORCID: 0009-0001-6300-9155
Live activity