Jason Lovell · Austin, Texas

I build AI agents. I also advise execs on whether they should.

Over twenty years across strategy and delivery. The last few spent writing specs for trustworthy agents, building the tools that test them, and helping leaders work out what to deploy.

  • 0Published standards
  • 0+Open source repos
  • 0+Years across UK and US

CurrentlyAAC v0.2-candidate.5DVAAC v0.1TSA preprint on TechRxiv

About

On the frontier of consumer tech for over twenty years. On AI since 2017.

I started in mobile before smartphones, when the consumer-tech frontier moved one new category at a time. The years since have been across most of them: mobile, IoT, wearables, tablets, VR and AR, connected home, smart audio. The arc tends to repeat. Loud enthusiasm, then a small number of teams quietly figuring out what is real and shipping it.

Since October 2019 I have been at PwC, working with leadership teams across sectors and most C-suite seats. UK first, then the US. Most engagements end up at the same question: which of this is real, and where does it create value worth shipping for? Before PwC I ran my own innovation consultancy. That is where AI caught me.

AlphaGo, OpenAI's DOTA work, and the AlphaGo Zero release pulled me in during 2017. I have read the field since. Attention Is All You Need, the GPT-2 staged release, AlphaFold, InstructGPT and RLHF before ChatGPT became a verb, the agent wave, the current scaling-versus-inference-time argument. I am not new here.

A few years ago I stopped just advising and started building. Agentic systems with Claude Code, Codex, and Cursor. Multi-agent orchestration. Replay engines, signed evidence objects, vulnerable benchmark corpora. I run every model worth running, fine-tune what warrants it, and test every coding agent, plugin, and MCP server on real work. Enough prototypes broken at 11pm to know what works in production and what only looks good in a demo.

What I care about now is the gap between what agentic AI can do and what is safe to ship. That gap is where standards, tooling, and judgement have to meet. It is also why most of what I publish is open source.

Where I am focused now

  • StandardsAuthoring TBOM, SBA, TSA, and AAC. Supply-chain primitives for agentic systems.
  • EngineeringReplay engines, signed evidence objects, vulnerable benchmark corpora.
  • AdvisoryWorking with leadership teams on what agentic AI is actually ready for, and what is not.
  • ReadingMech interp, test-time compute, world models, scalable oversight. See the threads section.

Open source

Recent work, mostly on agent assurance.

A spec, its paired benchmark, a cryptographic audit toolkit, and the replay engine that ties them together. Below that, the rest of the active tooling.

Featured

Governance
NewPython

Agent Assurance Case

Draft specification and reference verifier for a portable, signed evidence object that answers one release-critical question: can this agentic workflow ship, and can an auditor verify the evidence offline.

SpecEd25519Offline Verification
Observability
NewPython

DVAAC

Damn Vulnerable Agent Asset Corpus. A compact, runnable benchmark of intentionally vulnerable and intentionally clean agent release fixtures, paired with the AAC verifier so detectors can be measured against ground truth.

BenchmarkAssuranceFixtures
Governance
NewPython

ProofPack

Cryptographic verification for AI agent runs. Ed25519 signatures, Merkle audit trees, and tamper-evident bundles that record exactly what an agent did.

Ed25519Merkle TreesCryptographic Audit
Observability
NewTypeScript

BranchLab

Local-first replay engine for AI agents. Rewind any run, branch into counterfactual timelines, simulate policy changes, and produce evidence-grade audit reports without sending data anywhere.

Counterfactual BranchingAgent ReplayPolicy Simulation

Tooling

OrchestrationPython

Baton Studio

Multi-agent control room. Shared world model, causal dependency graph, energy-aware scheduling, and fair baton arbitration for coordinated agent teams.

jlov7/baton-studio
OrchestrationTypeScript

Meta Memory Studio

Memory control plane for AI agents. Inspect, edit, replay, and govern agent memory across sessions with full observability and policy enforcement.

jlov7/meta-memory-studio
OrchestrationTypeScript

Agent Director

Cinematic trace debugger for AI agents. Playback, replay, diff, and debug agent runs like editing video.

jlov7/agent-director
ObservabilityPython

SkillScope

Observability for Anthropic Skills. See which skill was intended, which files were referenced, where policy approvals fired, and how tokens and latency shifted.

jlov7/SkillScope
GovernanceTypeScript

Sentinel MCP

Governing Model-Context Protocol servers with policies, budgets, and verifiable provenance.

jlov7/Sentinel-MCP
ObservabilityPython

AMDM

End-to-end toolkit for observing agentic systems. Detects emergent anomalies in near-real time and benchmarks detector performance.

jlov7/AMDM
OrchestrationPython

Switchboard

Multi-provider agent orchestration sandbox. OpenAI, AWS Bedrock, Google Vertex with policy-based approvals and cryptographic audit trails.

jlov7/Switchboard
OrchestrationTypeScript

Scaffold Arena

Enterprise-grade scaffold engineering workbench. Shows how orchestration code around LLMs dominates outcomes in quality, reliability, and cost.

jlov7/scaffold-arena
ObservabilityPython

MCP Interop BakeOff

Research-grade harness for stress-testing MCP servers across multiple agent runtimes with reproducible task suites.

jlov7/MCP-Interop-BakeOff
Lab10 R&D experiments

Threads I track

What I read when I am not building, and where it shows up in the work.

Most agentic-AI work needs help from outside the LLM stack. The list below is the current reading list, ordered roughly by how much it is shaping what I am shipping. Frontier-lab discourse first, applied threads after.

  • Scalable oversight

    Most agentic releases now exceed what one reviewer can verify by hand. AAC and DVAAC are bets on the recursive-verification path with deterministic ground truth, not LLM-judge loops.

    AAC and DVAAC
  • Mechanistic interpretability

    Following the sparse-autoencoder and crosscoder line of work. The question I care about is when interpretability becomes audit grade, which is where it joins the AAC story.

    AAC
  • Inference-time compute

    Test-time scaling, process reward models, and search-at-inference change what a release decision means. AAC's verifier is built to recompute outcomes deterministically without an LLM in the loop.

    AAC verifier
  • World models

    Baton Studio sits in this neighbourhood: a shared world model and a causal dependency graph for coordinated agents. Watching V-JEPA and Genie, and curious where predictive coding meets multi-agent orchestration.

    Baton Studio
  • Recursive self-improvement

    Where self-play and agentic coding loops meet. The interesting question is whether the verification side can keep up, which is the same question DVAAC asks for assurance fixtures.

    DVAAC
  • Faithful chain-of-thought

    If CoT stops being monitorable under heavy RL training, a lot of current oversight assumptions fall over. Tracking the faithfulness and process-supervision literature closely.

  • Formal methods

    SMT solvers (Z3) and deterministic verifiers as a route to release decisions an auditor can check offline. Where neuro-symbolic verification becomes practical for agent systems.

    AAC verifier
  • Conformal prediction

    Conformal calibration and selective classification for LLM outputs and abstention. Active interest, no published artifact yet.

  • Knowledge graphs

    KG-augmented retrieval and corpus intelligence inside scaffold engineering work.

    Scaffold Arena
  • Geospatial AI

    FEMA NRI, BigQuery Earth Engine, and satellite imagery for compliance-aware geospatial underwriting.

    TerraRisk Agent

Standards

Four specs for the MCP and agent era.

Each spec is open source, has a DOI, and ships with reference tooling. The goal is the same in all four: make agent supply chains and release decisions something an auditor can verify.

  • Standard2026

    TBOM: Tool Bill of Materials

    A provenance and integrity standard for the MCP ecosystem. Cryptographically signed manifests that bind MCP server releases to immutable tool metadata, supporting automated trust verification and reducing the surface for tool poisoning in AI agent supply chains.

  • Standard2026

    SBA: Skill Bundle Attestation

    Deterministic bundle identity, content attestation format, and verification tooling for agent skill bundles. Minimal, reproducible, and supply-chain friendly. Adds trust at the skill layer.

  • Standard2026

    TSA: Tool Security Advisory

    Machine-readable security advisories for MCP tools. Defines a JSON format, a feed index, and a trust model so registries, hosts, and gateways can automatically block, warn, or remediate vulnerable tools.

  • Standard2026

    AAC: Agent Assurance Case

    Portable, signed, audit-grade evidence object for agentic AI release assurance. Binds inventory, detector coverage, findings, policy decisions, release conditions, compliance evidence, a deterministic verdict, and an Ed25519 signature. Verifiable offline.

Writing and talks

Specs, preprints, and the occasional talk.

Most of what I publish is open source. The releases below are reverse-chronological. Click any line for the artifact.

Ongoing notes

Most weeks I post shorter takes on LinkedIn and X. Anything worth keeping turns into a spec, a repo, or an entry above.

ORCID: 0009-0001-6300-9155

Live activity

Recent commits and releases.

Contact

Worth a conversation?

Working out whether agentic AI is real for your situation, or building something and want another set of eyes on it, or hiring for a senior AI role. Any of those is fair.

Replies usually inside a working day.

Jason Lovell · 2026Built with Next.js on Vercel