Verification & integrity — Ledgenter Docs

An agent saying "done" is a claim, not a fact. This chapter covers the machinery that turns claims into proofs: completion gates in the status machine, idempotency as a safety property, and two adversarial gates that attack the live backend — and can prove they would fail if it lied.

Verified done

In most trackers, done is a string anyone can write into a field. In Ledgenter the transition into done is a gated event inside the task_update RPC — checked in the same database transaction as the write, by the same database that holds the truth. A task can carry three optional gates, set at task_create time or patched later; each one moves a piece of "did the work actually happen?" out of agent etiquette and into the status machine.

Gate	What it is	What →done requires	The refusal you get
`acceptance_criteria`	A checklist of `{text, met}` items	Every item flipped to `met: true`	Names the first unmet criterion verbatim, so the agent knows exactly what is outstanding
`requires_evidence`	A boolean	At least one live code ref or attachment on the task	A hint pointing at `task_code_ref` (link the delivering commit/PR) or `attach_add`
`reviewer_actor_id`	A named reviewer	The review handoff answered by that reviewer	A hint pointing at `handoff_respond` — or at clearing the reviewer, if review is genuinely no longer wanted

The reviewer gate has a matching on-ramp: when a task with a reviewer moves to in_review, the database auto-creates the review handoff addressed to that reviewer and notifies them — deduplicated on a per-task fingerprint, so bouncing in and out of review never spams a second request. done then stays unreachable until the reviewer answers.

Two unconditional checks back the optional gates. First, done-while-blocked is rejected: if the task still has open blocking dependencies, the transition fails and the error counts them. Second, expected_status gives you optimistic concurrency: pass the status you believe the task is in, and if it moved underneath you the write fails with a hinted CONFLICT ("stale write: expected … but task is …") instead of silently clobbering another agent's transition. The precondition is consumed, never stored — re-read, then retry.

the gates firing

task_update(task_id, { "status": "done" })
// → { ok:false, error:{ message:
//      'cannot complete: acceptance criterion not met: "p95 under 200ms"' } }

task_update(task_id, { "status": "done", "expected_status": "in_review" })
// the task moved underneath you →
// { ok:false, error:{ code:"CONFLICT",
//      message:'stale write: expected status "in_review" but task is "done"' } }

[i]

The gates are opt-in per task. A task with none of the three behaves classically — but the floor never drops below the status machine itself: legal transitions only, no completing over open blockers, and reopening a finished task requires an explicit flag.

Idempotency as a safety property

Every write RPC opens by registering an idempotency key — inside the same transaction as the write it protects, so there is no window where the work happened but the key didn't stick. Replaying a used key returns the original result, flagged idempotent_replay: true: no second row, no double side effects, and the caller can tell a replay from a fresh write. Reusing a key with a different payload is treated as a logic bug, not a retry — it fails with a 409-class CONFLICT ("idempotency key reuse with different payload"), because silently serving the old result for new content would be a lie.

When the caller supplies no key, core derives one from the call's content — and deliberately folds in the calling actor and the ambient run key. This is the fix for a subtle phantom-success bug: two sibling runs forked from one parent, or two actors sharing one operator-pinned run id, can easily emit byte-identical writes. Without the fold they would collapse onto one row, and the loser would walk away holding the winner's id as a counterfeit success. With it, distinct siblings and distinct actors always get distinct keys; collapse only happens within one actor's one run, where it is genuinely a retry. When collapse is what you want — a cron tick ensuring "today's report task exists" — pass an explicit, date-stamped key.

The integrity gate: seven invariants

Verification of single writes is necessary but not sufficient — the system also has to hold up as a whole, under real concurrent agents. The first of two adversarial harnesses, harness/integrity-gate.mjs, asserts seven invariants against the live sandbox backend by reading ground truth: database rows, not model prose. It exits non-zero on any violation, independently of how pleasant the agent experience scored — the design rule is that a race must fail the run, not get "fixed" by tightening an input schema.

#	Invariant	How it is checked
1	No duplicate `seq` per tenant	Every live task's sequence number read back and asserted unique
2	No state transition applied by two distinct actors	At most one transition author per task across the run's telemetry
3	Idempotent replay produced zero dups	An active probe: replay a used key, count the rows
4	No contended handoff left wrongly open	The deliberately contended handoff ended answered, not stranded
5	Append-only spines unedited	Checksums over decisions and activity, taken twice — byte-identical or fail
6	No events lost across a `whoami` window	The since-last-seen delta count is self-consistent
7	Dependency cycles rejected	A deliberate cycle attempt returned `TASK_DEPENDENCY_CYCLE`, not a write

The gate's credential is itself a statement of posture: it runs through @ledgenter/core with an ordinary sandbox actor key whose JWT is RLS-scoped to the sandbox tenant — which is exactly the audit scope. No service-role key, no raw Postgres connection. The --micro flag first runs a standalone contention scenario (two identical transitions, an idempotency replay, a cycle attempt, a contended handoff) so the gate can be validated cheaply, without a full multi-agent re-run.

The concurrency gate: thirteen adversarial probes

The agent-experience harness cannot exercise the nastiest conditions: its personas use distinct keys in distinct processes, so shared-key siblings, overlapping ticks, a pinned LEDGENTER_RUN_ID, or a same-key run_start race never occur naturally. harness/concurrency-gate.mjs manufactures them deliberately — spawning out-of-process children with surgically controlled environments — and asserts ground truth in database rows and child stderr. Thirteen probes:

The attack	What must hold
16-wide same-key concurrent `run_start`	Exactly one run: one insert winner, fifteen reattach, no unique-violation leak
16-wide distinct-key `run_start` racing one repo upsert	All sixteen resolve to a single repository row
Concurrent repo upserts plus a cross-host name collision	Local upserts converge to one row; same-named repos on different hosts stay distinct
Shared-key sibling forks writing identical content	N rows for N siblings — no phantom success
Identical sibling emissions, then a same-run replay	Both sibling rows land under their own run; the replay still dedupes
Same-run identical concurrent writes	Collapse is flagged (`idempotent_replay` or a visible retryable error) — never a silent same-id double-ok
Two actors writing under one pinned run key	Never collapse — the actor fold keeps their keys distinct
Pinned `LEDGENTER_RUN_ID`, second fire after `run_end`	A fresh run is minted, with a stderr warning; the fires do not collapse
Pinned id plus a run group across two fires	Fresh per-tick keys, one series, tick 2's cursor seeded from tick 1 — a loop sees only since-last-tick
Overlapping sibling runs	Independent `whoami` cursors: one advancing never moves the other
A token-bearing remote URL and an absolute worktree path	The stored repo row is credential-free; the worktree path is basename-only
Real MCP server end-to-end, token in the environment	The transcript carries no provider token, and the first write's activity row already has a run id — lazy registration strictly precedes the write
Real MCP server end-to-end, repo with a `local_path`	The path is served to the agent (it needs it) but dropped from telemetry (the log doesn't)

Note what the last three probes are: leak attempts. Verification here is not only "did the data stay consistent" but "did a credential or a filesystem path escape into storage or logs" — checked by regex against the actual rows and the actual JSONL transcript, not by code review.

Teeth: a gate that cannot fail is decoration

A green checkmark only means something if the check is falsifiable. Both gates therefore take --inject-violation N, which forces probe N to report red — and the gate must then exit non-zero and render the run inadmissible. This is a unit-style proof that the gate fails red: if an injected violation ever passed, the gate itself would be the bug. Each gate also writes a machine-readable verdict (reports/<run_id>.integrity.json, .concurrency.json) and appends its section to the run's report; the exit code, not the prose, is the contract.

running the gates

node harness/integrity-gate.mjs --micro      # contention micro-scenario, then the 7 invariants
node harness/concurrency-gate.mjs           # the 13 probes against the live sandbox
node harness/integrity-gate.mjs --inject-violation 3
# ↑ must exit non-zero — proof the gate can fail

The CI ladder

The gates sit at the top of a ladder that runs from cheap static checks to live adversarial probes. Each rung catches what the rung below cannot:

every push     ──► typecheck → lint → tests → build        static, in order
               ──► RLS lint                                FORCE RLS + tenant policy, every public table
               ──► schema-drift gate                       generated types must match committed types
PRs touching
supabase / mcp ──► pgTAP isolation suite                   145 assertions, real Postgres
daily schedule ──► integrity gate · concurrency gate       7 invariants + 13 probes, LIVE backend

Typecheck, lint, test, build — the base build job, run in that order on every push.
RLS lint — asserts every public table carries forced row-level security and a tenant policy; service-only tables are explicitly allowlisted, never silently skipped (chapter 5).
Schema-drift gate — types regenerated from the live schema must match the committed ones; a migration without regenerated types fails the build.
pgTAP isolation suite — 145 assertions proving cross-tenant invisibility and raw-DML denial, gating every PR that touches the database or the MCP server.
Scheduled adversarial gates — the integrity and concurrency gates run daily against the live sandbox, so a regression in race behavior surfaces within a day even if no PR touched that code path.

[§]

Why this much machinery? Because the failure modes it hunts are the quiet ones: a phantom success, a collapsed sibling write, a clobbered transition, a token in a log line. None of them throw. The only way to find them is to attack the real system and read the real rows — which is exactly what these gates do, every day.