PAISL / Agent Boundary Assurance
Local-first agent-boundary benchmark and June 2026 white paper defining Agent Boundary Assurance for personal and enterprise AI agents.
Jason Lovell · Austin, Texas
I work where executive problem-framing meets agent architecture, hands-on build, governance, and proof-of-value. The public trail is specs, white papers, benchmarks, and small systems that test what should actually ship.
CurrentlyAgent Boundary AssurancePAISLnanoassembly
About
I started in mobile before smartphones, when the consumer-tech frontier moved one new category at a time. The years since have been across most of them: mobile, IoT, wearables, tablets, VR and AR, connected home, smart audio. The arc tends to repeat. Loud enthusiasm, then a small number of teams quietly figuring out what is real and shipping it.
I have spent years with leadership teams across sectors and most C-suite seats. UK first, then the US. Most serious AI conversations end up at the same question: which of this is real, and where does it create value worth shipping? Before that, I ran my own innovation consultancy. That is where AI caught me.
AlphaGo, OpenAI's DOTA work, and the AlphaGo Zero release pulled me in during 2017. I have read the field since. Attention Is All You Need, the GPT-2 staged release, AlphaFold, InstructGPT and RLHF before ChatGPT became a verb, the agent wave, the current scaling-versus-inference-time argument. I am not new here.
A few years ago I stopped just advising and started building properly. Agentic systems with Claude Code, Codex, and Cursor. Multi-agent orchestration. Replay engines, signed evidence objects, local boundary harnesses, vulnerable benchmark corpora. I run the models worth testing, fine-tune what warrants it, and test coding agents, plugins, and MCP servers on real work. Enough prototypes broken at 11pm to know what works and what only looks good in a demo.
What I care about now is the gap between what agentic AI can do, what an organization should trust, and what people will actually use. That gap is where strategy, architecture, build quality, governance, and judgement have to meet. It is also why most of what I publish is open source.
Where I am focused now
Open source
Four public pillars first: agent-boundary assurance, signed release evidence, vulnerable fixtures, and a compact interpretability sequence. The rest stays close by, but not at the same volume.
Featured
Local-first agent-boundary benchmark and June 2026 white paper defining Agent Boundary Assurance for personal and enterprise AI agents.
Multi-layer SAE feature-circuit lab calibrating when gradient, exact ablation, or cheap scores are worth paying for.
Recent research labs
June 2026 deployment stack labs
These are compact supporting artifacts, not the main show: policy layers, skill observability, MCP portability, agent commerce, budget controls, and multi-provider routing.
Tooling
Tamper-evident receipt and audit-pack system for agent and tool runs. Ed25519 signatures, Merkle trees, and portable proof bundles.
jlov7/ProofPackLocal-first replay engine for agent runs. Rewind a trace, branch into counterfactual timelines, and simulate policy changes before deployment.
jlov7/branchlabMulti-agent control room with a shared world model, causal dependency graph, energy-aware scheduling, and fair baton arbitration.
jlov7/baton-studioMemory control plane for agents. Inspect, evolve, and govern what agents remember across sessions.
jlov7/meta-memory-studioTrace debugger for AI agents. Scrub through execution, replay specific steps, and diff runs side by side.
jlov7/agent-directorWorkbench for measuring how scaffold choices change LLM quality, reliability, cost, latency, and failure profile.
jlov7/scaffold-arenaToolkit for observing agentic systems, detecting emergent anomalies, and benchmarking detector behaviour against synthetic scenarios.
jlov7/AMDMThreads I track
Most agentic-AI work needs help from outside the LLM stack. The list below is the current reading list, ordered roughly by how much it is shaping what I am shipping: boundary evidence, release assurance, interpretability, world models, and practical deployment controls.
The PAISL white paper defines Agent Boundary Assurance: evidence for what an agent accessed, remembered, transformed, sent, blocked, or executed across local and enterprise settings.
ABA white paperPAISL treats the benchmark object as the run itself: scenario, data items, consent state, boundary decisions, tool trace, egress record, scorecard, and failure cases.
PAISLMost agentic releases now exceed what one reviewer can verify by hand. AAC and DVAAC put the release decision into signed evidence and deterministic fixture checks.
AAC and DVAACnanocircuits, nanofeatures, and nanoassembly ask the same practical question at three levels: when does a circuit claim beat the strongest cheap baseline?
nanoassemblyGhostTrace measures behavioural half-life under recursive self-distillation. The toy result is supported; the local LLM tier is negative boundary evidence, not a recursive LLM claim.
GhostTraceTest-time scaling, process reward models, and search-at-inference change what a release decision means. AAC's verifier recomputes outcomes without an LLM in the loop.
AAC verifiernanoAWM is the small symbolic lab: learned consequence simulation in MiniOS, not production safety validation. Baton Studio is the multi-agent control-room side of the same interest.
nanoAWMProofKern keeps the negative result instead of sanding it down: four MLX-relative kernel wins become zero cross-framework wins against torch.compile on the same GPU.
ProofKernIf CoT stops being monitorable under heavy RL training, a lot of current oversight assumptions fall over. Tracking the faithfulness and process-supervision literature closely.
SMT solvers (Z3) and deterministic verifiers as a route to release decisions an auditor can check offline. Where neuro-symbolic verification becomes practical for agent systems.
AAC verifierConformal calibration and selective classification for LLM outputs and abstention. Active interest, no published artifact yet.
KG-augmented retrieval and corpus intelligence inside scaffold engineering work.
Scaffold ArenaFEMA NRI, BigQuery Earth Engine, and satellite imagery for compliance-aware geospatial underwriting.
TerraRisk AgentStandards
The work below is independent and artifact-backed. Some items are draft specs; the new Agent Boundary Assurance paper is a white paper. The common thread is simple: agent supply chains, tool use, and release decisions should leave evidence that someone else can inspect.
Independent white paper defining an evidence discipline for local and enterprise agents: what they accessed, remembered, transformed, sent, blocked, or executed. It connects PAISL, TBOM, SBA, TSA, and ABER into a reviewable boundary-assurance model.
A provenance and integrity standard for the MCP ecosystem. Cryptographically signed manifests that bind MCP server releases to immutable tool metadata, supporting automated trust verification and reducing the surface for tool poisoning in AI agent supply chains.
Deterministic bundle identity, content attestation format, and verification tooling for agent skill bundles. Minimal, reproducible, and supply-chain friendly. Adds trust at the skill layer.
Machine-readable security advisories for MCP tools. Defines a JSON format, a feed index, and a trust model so registries, hosts, and gateways can automatically block, warn, or remediate vulnerable tools.
Portable, signed, audit-grade evidence object for agentic AI release assurance. Binds inventory, detector coverage, findings, policy decisions, release conditions, compliance evidence, a deterministic verdict, and an Ed25519 signature. Verifiable offline.
Writing and talks
Most of what I publish is open source. The releases below are reverse-chronological. Click any line for the artifact.
Ongoing notes
Most weeks I post shorter takes on LinkedIn and X. Anything worth keeping turns into a spec, a repo, or an entry above.
ORCID: 0009-0001-6300-9155
Live activity