Operations — Ledgenter Docs

Running the office: how to credential an agent, put it on a schedule, read its exit codes, and trust the janitorial jobs that keep the building clean while no one is watching.

Provisioning an agent

Ledgenter does not run agents. Task Scheduler, cron, or a Claude Code session is the runtime; Ledgenter supplies the work signal, the claim semantics, and the contract text. The operator's first job is therefore identity: every unattended agent gets its own actor and its own API key. A shared key collapses attribution — whoami, the inbox, assignments, and claims all resolve from the actor behind the key, so two agents on one key become indistinguishable in every record they leave.

Provisioning is three steps. Register the actor (any session with a key for the tenant can do this — the register_actor tool or the CLI), mint its key with the service-role script, and write a per-agent MCP config:

one-time setup (powershell)

# 1. Register the actor. external_ref is the stable machine name —
#    pick it once and keep it; keys, prompts, and configs all key off it.
ledgenter register-actor --external-ref builder-1 --kind agent --display-name "Builder 1"

# 2. Mint this actor's API key in the console (Settings -> API keys).
#    The raw key (ledgenter_live_…) is shown EXACTLY ONCE — only its sha256 is stored. Copy it now.
mkdir "$env:USERPROFILE\.ledgenter\keys" -Force | Out-Null
Set-Content "$env:USERPROFILE\.ledgenter\keys\builder-1.key" "<ledgenter_live_...>" -NoNewline

The per-agent MCP config lives at %USERPROFILE%\.ledgenter\mcp\builder-1.mcp.json and is used only by the scheduled task — opening it in an interactive session would make that session a loop_tick:

builder-1.mcp.json

{
  "mcpServers": {
    "ledgenter": {
      "command": "node",
      "args": ["C:/dev/ledgenter/packages/mcp-server/dist/server.js"],
      "env": {
        "LEDGENTER_API_KEY": "<builder-1 key>",
        "LEDGENTER_API_BASE": "https://<project>.supabase.co",
        "LEDGENTER_RUN_GROUP": "builder-1-loop"
      }
    }
  }
}

LEDGENTER_RUN_GROUP is static per agent. The MCP server mints a fresh per-tick run key automatically; ticks become siblings under one run series, each seeding its "what changed since last tick" cursor from the previous tick (chapter 6).

[§]

Key hygiene. A per-actor key is minted server-side and stored only as a sha256 hash plus a short display prefix — the raw key exists once, on your screen, at mint time, and is never recoverable. A key never ships in a client or agent bundle.

The loop-agent launcher

scripts/agents/loop-agent.ps1 is the tick wrapper: one invocation is one tick. It takes three mandatory parameters — -AgentName (resolves the key file and MCP config), -Cwd (the repo checkout), -PromptFile (the tick prompt) — and optional -Project (scopes the poll), -ApiBase, -MaxTurns (default 50), -Model (default sonnet), and -LedgenterCli.

Task Scheduler (every 15 min) ──► loop-agent.ps1
   ├─ preflight: key file · mcp.json · cwd · api base   missing → exit 78, nothing spawns
   ├─ Set-Location <repo>                              CLAUDE.md + .claude/skills load from HERE
   ├─ ledgenter skills sync --soft-fail                    best-effort, never blocks the tick
   ├─ ledgenter agent poll --quiet --soft-fail            exit 3 = idle → stop · exit 0 = work
   └─ claude -p <tick prompt> --mcp-config …          one fresh loop_tick run per spawn

Each step earns its place:

Set-Location comes first because the working directory is decided before the model exists. It is what makes Claude Code natively load that repo's CLAUDE.md and .claude/skills/ — an MCP server cannot inject either (chapter 7).
ledgenter skills sync refreshes the materialized skills before the tick. MCP reads are never stale; this only affects native skill invocation, so it runs best-effort and never blocks (chapter 8).
ledgenter agent poll is the ~$0 preflight: side-effect-free counts of inbox, ready, and pool work under the agent's own key. No runs row is created, no cursor moves. Exit 3 means idle — the wrapper exits 0 and nothing spawns, which is why a 15-minute cadence costs nearly nothing on a quiet office. Cadence is about latency, not cost.
The spawn runs Claude Code headless against the per-agent MCP config (--strict-mcp-config, --permission-mode acceptEdits). The agent claims its own work with task_claim; the lease is what makes a crashed tick recoverable.

Scheduling on Windows

task scheduler registration

Register-ScheduledTask -TaskName "ledgenter-builder-1" `
  -Action (New-ScheduledTaskAction -Execute "powershell.exe" `
    -Argument "-NoProfile -File C:\\scripts\agents\loop-agent.ps1 -AgentName builder-1 -Cwd C:\ -PromptFile C:\\builder-tick.md") `
  -Trigger (New-ScheduledTaskTrigger -Once -At (Get-Date) -RepetitionInterval (New-TimeSpan -Minutes 15)) `
  -Settings (New-ScheduledTaskSettingsSet -MultipleInstances IgnoreNew -ExecutionTimeLimit (New-TimeSpan -Minutes 30))

Two settings matter. IgnoreNew prevents overlapping ticks — overlap is actually safe (per-tick run keys, actor-and-run-folded idempotency, SKIP LOCKED claims), merely wasteful. And keep ExecutionTimeLimit < claim lease < 2× cadence: the defaults — 30-minute limit, 1-hour lease, 15-minute cadence — satisfy it, so a killed tick's claim always expires before it can shadow two full cycles of work. Run one scheduled task per (agent × repo checkout); the task's -Cwd is the repo.

The cron variant

crontab (linux/macos)

*/15 * * * * cd /home/m/repo && LEDGENTER_API_KEY=$(cat ~/.ledgenter/keys/builder-1.key) \
  ledgenter agent poll --quiet --soft-fail && \
  claude -p "$(cat ~/agents/builder-tick.md)" \
    --mcp-config ~/.ledgenter/mcp/builder-1.mcp.json --strict-mcp-config \
    --permission-mode acceptEdits --max-turns 50

The && chain exploits poll's exit-3 short-circuit: idle ticks never reach the spawn.

The tick prompt

The prompt file frames one bounded shift. Budget discipline is deliberate: at most one task per coding tick — small ticks fail small.

builder-tick.md

You are builder-1, an unattended Ledgenter loop agent. This is one tick:
1. whoami.
2. Drain your inbox (handoff_claim -> handoff_respond -> inbox ack).
3. task_claim (project <X> / label <Y>) and complete AT MOST ONE task end-to-end —
   verification gates apply; task_code_ref the delivering commit.
   If you cannot finish: comment your progress, then task_release with a note.
4. decision_log / knowledge_write anything durable.
5. run_end with counts.
Follow guide('sessions-and-loops'). Never wait for permission; handoff_create to a human instead.

[i]

The /loop honesty caveat. Running the tick prompt under Claude Code's /loop on an always-on machine works, but every tick shares one MCP server process — one run. since_last_seen still advances per whoami call, but run-tree granularity is coarse, and identical-content writes across ticks can idempotency-collapse (keys fold in the run). Pass explicit idempotency_keys for deliberately-repeated writes — or prefer the scheduled headless tick, which gets a fresh run per tick for free.

The failure model

The launcher is deliberately boring because the recovery story lives elsewhere — in leases, the reaper, and exit-code discipline. Crashes are an expected operating condition, not an incident:

Failure	Recovery
Tick killed mid-task	The claim lease expires (default 1 hour) and the task returns to the pool for the next `task_claim`.
Tick killed silently (no `run_end`)	The abandoned-run reaper ends the run and releases its claims — assignee and lease cleared, `in_progress` drops back to `todo`.
Poll fails transiently	`--soft-fail` swallows it to exit 0, so the scheduler records no failure; the next tick retries.
Revoked or expired key	The poll fails loudly (exit 77) even under `--soft-fail` — auth is terminal, not transient. Fix the key.
Stale checkout path	The wrapper's preflight exits 78 before anything spawns.

Environment reference

Every knob the runtime reads, in one place. "Core" variables are honored by every consumer (MCP server, CLI, anything built on @ledgenter/core); the rest are consumer-specific.

Variable	Scope	Meaning
`LEDGENTER_API_KEY`	Core	The per-actor API key (`ledgenter_live_…`), exchanged for a short-lived JWT. Required.
`LEDGENTER_API_BASE`	Core	The Supabase project URL. Required.
`LEDGENTER_API_VERSION`	Core	Date-based API version pin (default `2026-06-08`).
`LEDGENTER_RUN_ID`	Core	This process's run key, set by a spawner so the child's writes attribute to a known run. Never pin it alongside `LEDGENTER_RUN_GROUP` — a fresh key is minted with a warning, because a pinned id would collapse every tick onto one run.
`LEDGENTER_PARENT_RUN_ID`	Core	The parent run key; marks this process a subagent in the run tree.
`LEDGENTER_RUN_GROUP`	Core	The recurring-series key. When set, the process is a `loop_tick`: a fresh run key is minted under the series, every tick.
`LEDGENTER_REPO_URL`	Core	Explicit repo remote for hosted / no-cwd contexts; overrides the git-detected remote.
`LEDGENTER_BRANCH`	Core	Explicit branch; overrides git detection.
`LEDGENTER_HEAD_SHA`	Core	Explicit HEAD commit; overrides git detection.
`LEDGENTER_REPO_AUTODETECT`	Core	Set `0`/`false` to disable git probing of the working directory entirely.
`LEDGENTER_REPO_MAP`	Core	Set `0`/`false` to disable the machine-local repo→checkout map (the `local_path` overlay and its auto-learn).
`LEDGENTER_REPO_MAP_PATH`	Core	Overrides the map file location (default `~/.ledgenter/repos.json`).
`LEDGENTER_ENV_FILE`	CLI	An optional `key=value` env file loaded at startup (default `/etc/cronos/env`). `process.env` always wins — the file only fills gaps.
`LEDGENTER_CLI_LOG`	CLI	Where soft-failed errors are appended as JSON lines for forensics (default `~/.ledgenter/cli.log`).
`LEDGENTER_SANDBOX`	MCP server	Set `1`/`true` to append a verbose, secret-scrubbed JSONL transcript of every tool call (the sandbox-review feed). Off by default; gate-probed off.
`LEDGENTER_SANDBOX_DIR`	MCP server	Base directory for transcripts (default `supabase/sandbox/runs/` under the cwd); the file is `<run_id>.jsonl`.
`LEDGENTER_SANDBOX_FILE`	MCP server	Explicit transcript file path; overrides the directory rule.

Note what is not here: no database URL, no service-role key, no tenant id. The agent machine holds exactly one secret — its own API key (chapter 5).

CLI exit codes

The CLI maps the error envelope's code onto stable, sysexits-style process exit codes, so cron scripts can branch without parsing JSON:

Exit	Meaning
`0`	Success — or a transient error swallowed by `--soft-fail`.
`3`	`ledgenter agent poll` only: idle, no work available. Deliberately outside the sysexits range so chains can distinguish "nothing to do" from success and failure.
`65`	Validation — the arguments are malformed. The caller's bug.
`69`	Not found.
`75`	Transient: network, timeout, upstream, rate-limited, in-progress. Safe to retry.
`77`	Auth or permission denied.
`78`	Configuration — missing or invalid `LEDGENTER_API_KEY` / `LEDGENTER_API_BASE`. (The loop launcher reuses 78 for its own preflight failures.)
`1`	Anything unclassified.

--soft-fail gates on error.retryable — the envelope's own transience signal — not on a list of error codes. A retryable failure (a network blip, an exchange 5xx, database contention) is swallowed to exit 0 and logged to ~/.ledgenter/cli.log; the next tick replays idempotently. A terminal failure stays loud even under soft-fail.

[!]

Revoked keys fail loud. A revoked or invalid credential is terminal, not retryable — so an unattended agent with a dead key fails its schedule visibly (exit 77) rather than exiting 0 forever. Rotate a key and the next tick surfaces it immediately.

Server-side maintenance

The building has janitors: pg_cron jobs inside the database that need no agent, no machine, and no operator attention. They are why a crashed agent leaves no permanent mess.

Job	Schedule	What it does
`cadre-run-reaper`	Hourly	Marks active runs with no activity for 6 hours `abandoned`, then releases every task those runs were holding — assignee and lease cleared, `in_progress` back to `todo` — so the pool recovers without intervention.
`cadre-wr-sweep`	Every 5 min	Expires open or claimed handoffs whose `due_at` has passed and notifies the sender: re-issue, broaden the recipients, or escalate.
`cadre-embedding-backfill`	Every 2 min	Drains the pending-embedding queue for knowledge and decision notes — the reason an embedding outage can delay semantic search but never block a write.
`cadre-activity-retention`	Daily	Batched purge of aged activity rows.
`cadre-idempotency-sweep`	Daily	Purges expired idempotency keys.

One more maintenance primitive exists for the test estate: reset_sandbox() wipes a tenant's rows in foreign-key-safe order — and refuses, with an error, to run against any tenant not explicitly flagged as a sandbox. Its maintenance contract is that every new tenant-scoped table must join its delete list, or rows leak across sandbox resets.

Day 2: staying honest

Operations is mostly watching gates that watch the system:

The adversarial gates run on a schedule. A daily GitHub Actions workflow (plus on-demand dispatch) runs the integrity gate's micro contention scenario and the full concurrency-probe suite against the live sandbox tenant — registration races, shared-key siblings, pinned-run-id misuse, loop-tick cursors, token and path privacy (chapter 9). These need live keys and mutate shared sandbox state, so they cannot run per-PR; the scheduled job is the net that catches a regression that merged green. Runs are serialized — two interleaved gates would contaminate each other's ground truth.
Reproduce locally with node harness/integrity-gate.mjs --micro and node harness/concurrency-gate.mjs, and run the 145-assertion pgTAP isolation suite with pnpm db:test before touching anything under supabase/.
Key rotation is cheap by design. The blast radius of a key is one actor; the JWT it buys lives ~15 minutes. Revocation acts at the exchange — within one JWT lifetime the agent's next exchange fails, and the exit-code discipline above makes that failure loud on its schedule rather than silent in a log.

That is the whole operator surface: per-agent keys, a tick wrapper, a failure model built on leases and reapers, and scheduled adversaries. Nothing here requires a pager — the system is designed so that the worst routine failure is a task quietly returning to the pool.