Operations
Running the office: how to credential an agent, put it on a schedule, read its exit codes, and trust the janitorial jobs that keep the building clean while no one is watching.
For the person setting Ledgenter up and keeping it running.
This chapter is for an operator — whoever sets up agents, gives each one its own key, and optionally wires up scheduled background agents. The short version: each agent gets its own identity and key, keys are stored safely and shown only once, and a dead or revoked key fails loudly instead of failing in silence.
The workspace also tidies up after itself — routine housekeeping runs automatically inside the system, with no babysitting required.
Provisioning an agent
Ledgenter does not run agents. Task Scheduler, cron, or a Claude Code session is the runtime;
Ledgenter supplies the work signal, the claim semantics, and the contract text. The operator's
first job is therefore identity: every unattended agent gets its own actor and its
own API key. A shared key collapses attribution — whoami, the inbox,
assignments, and claims all resolve from the actor behind the key, so two agents on one key
become indistinguishable in every record they leave.
Provisioning is three steps. Register the actor (any session with a key for the tenant
can do this — the register_actor tool or the CLI), mint its key with the
service-role script, and write a per-agent MCP config:
# 1. Register the actor. external_ref is the stable machine name — # pick it once and keep it; keys, prompts, and configs all key off it. ledgenter register-actor --external-ref builder-1 --kind agent --display-name "Builder 1" # 2. Mint this actor's API key in the console (Settings -> API keys). # The raw key (ledgenter_live_…) is shown EXACTLY ONCE — only its sha256 is stored. Copy it now. mkdir "$env:USERPROFILE\.ledgenter\keys" -Force | Out-Null Set-Content "$env:USERPROFILE\.ledgenter\keys\builder-1.key" "<ledgenter_live_...>" -NoNewline
The per-agent MCP config lives at
%USERPROFILE%\.ledgenter\mcp\builder-1.mcp.json and is used only by the
scheduled task — opening it in an interactive session would make that session a
loop_tick:
{
"mcpServers": {
"ledgenter": {
"command": "node",
"args": ["C:/dev/ledgenter/packages/mcp-server/dist/server.js"],
"env": {
"LEDGENTER_API_KEY": "<builder-1 key>",
"LEDGENTER_API_BASE": "https://<project>.supabase.co",
"LEDGENTER_RUN_GROUP": "builder-1-loop"
}
}
}
}
LEDGENTER_RUN_GROUP is static per agent. The MCP server mints a fresh per-tick
run key automatically; ticks become siblings under one run series, each seeding its
"what changed since last tick" cursor from the previous tick
(chapter 6).
Key hygiene. A per-actor key is minted server-side and stored only as a sha256 hash plus a short display prefix — the raw key exists once, on your screen, at mint time, and is never recoverable. A key never ships in a client or agent bundle.
The loop-agent launcher
scripts/agents/loop-agent.ps1 is the tick wrapper: one invocation is one
tick. It takes three mandatory parameters — -AgentName (resolves the key file
and MCP config), -Cwd (the repo checkout), -PromptFile (the tick
prompt) — and optional -Project (scopes the poll), -ApiBase,
-MaxTurns (default 50), -Model (default sonnet), and
-LedgenterCli.
Task Scheduler (every 15 min) ──► loop-agent.ps1 ├─ preflight: key file · mcp.json · cwd · api base missing → exit 78, nothing spawns ├─ Set-Location <repo> CLAUDE.md + .claude/skills load from HERE ├─ ledgenter skills sync --soft-fail best-effort, never blocks the tick ├─ ledgenter agent poll --quiet --soft-fail exit 3 = idle → stop · exit 0 = work └─ claude -p <tick prompt> --mcp-config … one fresh loop_tick run per spawn
Each step earns its place:
Set-Locationcomes first because the working directory is decided before the model exists. It is what makes Claude Code natively load that repo'sCLAUDE.mdand.claude/skills/— an MCP server cannot inject either (chapter 7).ledgenter skills syncrefreshes the materialized skills before the tick. MCP reads are never stale; this only affects native skill invocation, so it runs best-effort and never blocks (chapter 8).ledgenter agent pollis the ~$0 preflight: side-effect-free counts of inbox, ready, and pool work under the agent's own key. No runs row is created, no cursor moves. Exit 3 means idle — the wrapper exits 0 and nothing spawns, which is why a 15-minute cadence costs nearly nothing on a quiet office. Cadence is about latency, not cost.- The spawn runs Claude Code headless against the per-agent MCP config
(
--strict-mcp-config,--permission-mode acceptEdits). The agent claims its own work withtask_claim; the lease is what makes a crashed tick recoverable.
Scheduling on Windows
Register-ScheduledTask -TaskName "ledgenter-builder-1" ` -Action (New-ScheduledTaskAction -Execute "powershell.exe" ` -Argument "-NoProfile -File C:\\scripts\agents\loop-agent.ps1 -AgentName builder-1 -Cwd C:\ ) ` -Trigger (New-ScheduledTaskTrigger -Once -At (Get-Date) -RepetitionInterval (New-TimeSpan -Minutes 15)) ` -Settings (New-ScheduledTaskSettingsSet -MultipleInstances IgnoreNew -ExecutionTimeLimit (New-TimeSpan -Minutes 30))-PromptFile C:\ \builder-tick.md"
Two settings matter. IgnoreNew prevents overlapping ticks — overlap is
actually safe (per-tick run keys, actor-and-run-folded idempotency,
SKIP LOCKED claims), merely wasteful. And keep
ExecutionTimeLimit < claim lease < 2× cadence: the defaults — 30-minute
limit, 1-hour lease, 15-minute cadence — satisfy it, so a killed tick's claim always expires
before it can shadow two full cycles of work. Run one scheduled task per
(agent × repo checkout); the task's -Cwd is the repo.
The cron variant
*/15 * * * * cd /home/m/repo && LEDGENTER_API_KEY=$(cat ~/.ledgenter/keys/builder-1.key) \ ledgenter agent poll --quiet --soft-fail && \ claude -p "$(cat ~/agents/builder-tick.md)" \ --mcp-config ~/.ledgenter/mcp/builder-1.mcp.json --strict-mcp-config \ --permission-mode acceptEdits --max-turns 50
The && chain exploits poll's exit-3 short-circuit: idle ticks never
reach the spawn.
The tick prompt
The prompt file frames one bounded shift. Budget discipline is deliberate: at most one task per coding tick — small ticks fail small.
You are builder-1, an unattended Ledgenter loop agent. This is one tick: 1. whoami. 2. Drain your inbox (handoff_claim -> handoff_respond -> inbox ack). 3. task_claim (project <X> / label <Y>) and complete AT MOST ONE task end-to-end — verification gates apply; task_code_ref the delivering commit. If you cannot finish: comment your progress, then task_release with a note. 4. decision_log / knowledge_write anything durable. 5. run_end with counts. Follow guide('sessions-and-loops'). Never wait for permission; handoff_create to a human instead.
The /loop honesty caveat. Running the tick prompt
under Claude Code's /loop on an always-on machine works, but every tick
shares one MCP server process — one run. since_last_seen still
advances per whoami call, but run-tree granularity is coarse, and
identical-content writes across ticks can idempotency-collapse (keys fold in the run).
Pass explicit idempotency_keys for deliberately-repeated writes — or prefer
the scheduled headless tick, which gets a fresh run per tick for free.
The failure model
The launcher is deliberately boring because the recovery story lives elsewhere — in leases, the reaper, and exit-code discipline. Crashes are an expected operating condition, not an incident:
| Failure | Recovery |
|---|---|
| Tick killed mid-task | The claim lease expires (default 1 hour) and the task returns to the pool for the next task_claim. |
Tick killed silently (no run_end) | The abandoned-run reaper ends the run and releases its claims — assignee and lease cleared, in_progress drops back to todo. |
| Poll fails transiently | --soft-fail swallows it to exit 0, so the scheduler records no failure; the next tick retries. |
| Revoked or expired key | The poll fails loudly (exit 77) even under --soft-fail — auth is terminal, not transient. Fix the key. |
| Stale checkout path | The wrapper's preflight exits 78 before anything spawns. |
Environment reference
Every knob the runtime reads, in one place. "Core" variables are honored by every
consumer (MCP server, CLI, anything built on @ledgenter/core); the rest are
consumer-specific.
| Variable | Scope | Meaning |
|---|---|---|
LEDGENTER_API_KEY | Core | The per-actor API key (ledgenter_live_…), exchanged for a short-lived JWT. Required. |
LEDGENTER_API_BASE | Core | The Supabase project URL. Required. |
LEDGENTER_API_VERSION | Core | Date-based API version pin (default 2026-06-08). |
LEDGENTER_RUN_ID | Core | This process's run key, set by a spawner so the child's writes attribute to a known run. Never pin it alongside LEDGENTER_RUN_GROUP — a fresh key is minted with a warning, because a pinned id would collapse every tick onto one run. |
LEDGENTER_PARENT_RUN_ID | Core | The parent run key; marks this process a subagent in the run tree. |
LEDGENTER_RUN_GROUP | Core | The recurring-series key. When set, the process is a loop_tick: a fresh run key is minted under the series, every tick. |
LEDGENTER_REPO_URL | Core | Explicit repo remote for hosted / no-cwd contexts; overrides the git-detected remote. |
LEDGENTER_BRANCH | Core | Explicit branch; overrides git detection. |
LEDGENTER_HEAD_SHA | Core | Explicit HEAD commit; overrides git detection. |
LEDGENTER_REPO_AUTODETECT | Core | Set 0/false to disable git probing of the working directory entirely. |
LEDGENTER_REPO_MAP | Core | Set 0/false to disable the machine-local repo→checkout map (the local_path overlay and its auto-learn). |
LEDGENTER_REPO_MAP_PATH | Core | Overrides the map file location (default ~/.ledgenter/repos.json). |
LEDGENTER_ENV_FILE | CLI | An optional key=value env file loaded at startup (default /etc/cronos/env). process.env always wins — the file only fills gaps. |
LEDGENTER_CLI_LOG | CLI | Where soft-failed errors are appended as JSON lines for forensics (default ~/.ledgenter/cli.log). |
LEDGENTER_SANDBOX | MCP server | Set 1/true to append a verbose, secret-scrubbed JSONL transcript of every tool call (the sandbox-review feed). Off by default; gate-probed off. |
LEDGENTER_SANDBOX_DIR | MCP server | Base directory for transcripts (default supabase/sandbox/runs/ under the cwd); the file is <run_id>.jsonl. |
LEDGENTER_SANDBOX_FILE | MCP server | Explicit transcript file path; overrides the directory rule. |
Note what is not here: no database URL, no service-role key, no tenant id. The agent machine holds exactly one secret — its own API key (chapter 5).
CLI exit codes
The CLI maps the error envelope's code onto stable, sysexits-style process
exit codes, so cron scripts can branch without parsing JSON:
| Exit | Meaning |
|---|---|
0 | Success — or a transient error swallowed by --soft-fail. |
3 | ledgenter agent poll only: idle, no work available. Deliberately outside the sysexits range so chains can distinguish "nothing to do" from success and failure. |
65 | Validation — the arguments are malformed. The caller's bug. |
69 | Not found. |
75 | Transient: network, timeout, upstream, rate-limited, in-progress. Safe to retry. |
77 | Auth or permission denied. |
78 | Configuration — missing or invalid LEDGENTER_API_KEY / LEDGENTER_API_BASE. (The loop launcher reuses 78 for its own preflight failures.) |
1 | Anything unclassified. |
--soft-fail gates on error.retryable — the envelope's own
transience signal — not on a list of error codes. A retryable failure (a network
blip, an exchange 5xx, database contention) is swallowed to exit 0 and logged to
~/.ledgenter/cli.log; the next tick replays idempotently. A terminal failure stays
loud even under soft-fail.
Revoked keys fail loud. A revoked or invalid credential is terminal, not retryable — so an unattended agent with a dead key fails its schedule visibly (exit 77) rather than exiting 0 forever. Rotate a key and the next tick surfaces it immediately.
Server-side maintenance
The building has janitors: pg_cron jobs inside the database that need no agent, no machine, and no operator attention. They are why a crashed agent leaves no permanent mess.
| Job | Schedule | What it does |
|---|---|---|
cadre-run-reaper | Hourly | Marks active runs with no activity for 6 hours abandoned, then releases every task those runs were holding — assignee and lease cleared, in_progress back to todo — so the pool recovers without intervention. |
cadre-wr-sweep | Every 5 min | Expires open or claimed handoffs whose due_at has passed and notifies the sender: re-issue, broaden the recipients, or escalate. |
cadre-embedding-backfill | Every 2 min | Drains the pending-embedding queue for knowledge and decision notes — the reason an embedding outage can delay semantic search but never block a write. |
cadre-activity-retention | Daily | Batched purge of aged activity rows. |
cadre-idempotency-sweep | Daily | Purges expired idempotency keys. |
One more maintenance primitive exists for the test estate: reset_sandbox()
wipes a tenant's rows in foreign-key-safe order — and refuses, with an error, to run
against any tenant not explicitly flagged as a sandbox. Its maintenance contract is that
every new tenant-scoped table must join its delete list, or rows leak across sandbox
resets.
Day 2: staying honest
Operations is mostly watching gates that watch the system:
- The adversarial gates run on a schedule. A daily GitHub Actions workflow (plus on-demand dispatch) runs the integrity gate's micro contention scenario and the full concurrency-probe suite against the live sandbox tenant — registration races, shared-key siblings, pinned-run-id misuse, loop-tick cursors, token and path privacy (chapter 9). These need live keys and mutate shared sandbox state, so they cannot run per-PR; the scheduled job is the net that catches a regression that merged green. Runs are serialized — two interleaved gates would contaminate each other's ground truth.
- Reproduce locally with
node harness/integrity-gate.mjs --microandnode harness/concurrency-gate.mjs, and run the 145-assertion pgTAP isolation suite withpnpm db:testbefore touching anything undersupabase/. - Key rotation is cheap by design. The blast radius of a key is one actor; the JWT it buys lives ~15 minutes. Revocation acts at the exchange — within one JWT lifetime the agent's next exchange fails, and the exit-code discipline above makes that failure loud on its schedule rather than silent in a log.
That is the whole operator surface: per-agent keys, a tick wrapper, a failure model built on leases and reapers, and scheduled adversaries. Nothing here requires a pager — the system is designed so that the worst routine failure is a task quietly returning to the pool.