HITL Coordinator
The HITL coordinator is the control plane for override request management. It tracks pending requests, issues signed tokens, and enforces single-use redemption via an SQLite-backed REST API.
Architecture
Section titled “Architecture”The coordinator runs as a standalone service (kairos hitl serve) with four core components:
| Component | Role |
|---|---|
| Service | Axum REST API server |
| Store | SQLite backend with WAL journaling |
| Signer | RSA-PSS token signing |
| Config | TOML configuration loader with validation |
Starting the Coordinator
Section titled “Starting the Coordinator”kairos hitl serve --config coordinator.tomlThe coordinator validates its configuration at startup:
- Loads and parses the deployment policy
- Verifies the policy has an
hitlsection - Checks
defaultTokenTtlMs <= maxTokenTtlMs - For each authority: loads private key, derives public key, verifies it matches the policy
- Checks for duplicate key IDs
If any check fails, the coordinator exits with an error.
Configuration
Section titled “Configuration”TOML Schema
Section titled “TOML Schema”bind = "0.0.0.0:8787"dbPath = "/data/hitl.sqlite"pendingRequestTtlMs = 3600000defaultTokenTtlMs = 300000
[policy]path = "/path/to/policy.json"
[[authorities]]keyId = "operator-1"operatorId = "alice"privateKeyPemPath = "/path/to/operator-1.pem"
[[authorities]]keyId = "operator-2"operatorId = "bob"privateKeyPemPath = "/path/to/operator-2.pem"| Field | Description |
|---|---|
bind | Address and port to listen on |
dbPath | Path to SQLite database file (created if absent) |
pendingRequestTtlMs | How long pending requests remain active before expiring (milliseconds) |
defaultTokenTtlMs | Default token TTL when not specified in approval (milliseconds) |
policy.path | Path to the signed deployment policy JSON |
Authority Configuration
Section titled “Authority Configuration”Each [[authorities]] entry maps a key ID to an operator identity and private key:
| Field | Description |
|---|---|
keyId | Must match an authority entry in the deployment policy |
operatorId | Must match the operatorId in the policy for this key |
privateKeyPemPath | Path to RSA private key in PKCS#8 or PKCS#1 PEM format |
At startup, the coordinator derives the public key from each private key and verifies it matches the public key in the deployment policy. This prevents key mismatches that would produce valid-looking but unverifiable tokens.
SQLite Backend
Section titled “SQLite Backend”The coordinator uses SQLite with WAL (Write-Ahead Logging) mode and a 5-second busy timeout. The schema is auto-migrated on startup.
Tables
Section titled “Tables”override_requests — tracks the lifecycle of each override request:
| Column | Type | Description |
|---|---|---|
coordinator_request_id | TEXT PK | UUID v4 identifier |
status | TEXT | PENDING, APPROVED, DENIED, EXPIRED, REDEEMED |
evaluation_request | TEXT | Original evaluation request JSON |
evaluation_response | TEXT | Original evaluation response JSON |
request_hash | TEXT | Canonical SHA-256 hash of the request |
action_hash | TEXT | Action hash (if applicable) |
license_id | TEXT | License ID from the submitter |
actor_id | TEXT | Target actor ID (nullable, resolved from inner fields) |
intent_id | TEXT | Adaptive retry grouping key (nullable) |
failure_fingerprint | TEXT | Deterministic failure hash for dedupe (nullable) |
submitted_at | TEXT | Submission timestamp (RFC 3339) |
request_expires_at | TEXT | When this request expires |
sentinel_feed | TEXT | Sentinel telemetry feed JSON (optional) |
sentinel_summary | TEXT | Sentinel summary JSON (optional) |
issued_tokens — tracks signed tokens for single-use enforcement:
| Column | Type | Description |
|---|---|---|
token_id | TEXT PK | UUID v4 from the token payload |
coordinator_request_id | TEXT FK | Links to the parent request |
payload | TEXT | JSON payload string |
signature | TEXT | Base64url signature |
issued_at | TEXT | Token creation timestamp |
expires_at | TEXT | Token expiry timestamp |
redeemed_at | TEXT | Redemption timestamp (NULL = not redeemed) |
audit_events — immutable audit trail:
| Column | Type | Description |
|---|---|---|
id | TEXT PK | UUID v4 |
coordinator_request_id | TEXT FK | Links to the parent request |
event_type | TEXT | SUBMITTED, APPROVED, DENIED, EXPIRED, REDEEMED |
actor_id | TEXT | Who performed the action |
timestamp | TEXT | Event timestamp |
note | TEXT | Optional operator note |
Status Lifecycle
Section titled “Status Lifecycle”PENDING ──┬──► APPROVED ──► REDEEMED ├──► DENIED └──► EXPIRED (lazy, on read)Expiration is lazy: when listing or fetching requests, the coordinator checks for pending requests past their request_expires_at and transitions them to EXPIRED with an audit event.
REST API
Section titled “REST API”Endpoints
Section titled “Endpoints”| Method | Path | Description |
|---|---|---|
POST | /v1/override-requests | Submit a new override request |
GET | /v1/override-requests | List requests (optional ?status= filter) |
GET | /v1/override-requests/:id | Get a single request with details |
POST | /v1/override-requests/:id/approve | Approve and issue a signed token |
POST | /v1/override-requests/:id/deny | Deny the request |
POST | /v1/override-tokens/redeem | Redeem a token (single-use check) |
GET | /healthz | Health check |
Submit Request
Section titled “Submit Request”curl -X POST http://127.0.0.1:8787/v1/override-requests \ -H "Content-Type: application/json" \ -d '{ "evaluationRequest": { ... }, "evaluationResponse": { ... }, "licenseId": "lic_test_001", "actorId": "agent-1", "sentinelFeed": { ... }, "sentinelSummary": { ... }, "source": "tui-v2" }'Validation:
- Response decision must be
REJECT_STATEorREJECT_ACTION - Escalation type must be
HUMAN_ESCALATION - When adaptive evaluation detail is present,
escalationRecommendedmust also beHUMAN_ESCALATION(rejects inconsistent payloads where top-level saysHUMAN_ESCALATIONbut adaptive saysREFORMULATE) - Request must not already contain an
overrideToken licenseIdis required
Actor ID resolution: The coordinator resolves actorId from three sources in priority order: submission.actorId → evaluationRequest.actorId → evaluation.evaluatedActorId. The resolved value is persisted and used for all dedupe and cooldown lookups.
Adaptive gating (when adaptiveEscalation.enabled = true in the deployment policy):
| Gate | Condition | Response |
|---|---|---|
| Dedupe | Pending request exists for same (actor, intent, fingerprint) | 409 with {"coordinatorRequestId": "...", "deduplicated": true} |
| Deny cooldown | Prior denial within cooldownAfterDenyMs window | 409 Conflict |
| Material change | Same failureFingerprint as denied request when requireMaterialChangeAfterDeny is true | 409 Conflict |
Dedupe uses null-safe COALESCE matching, so legacy submissions without intentId or failureFingerprint still participate — their NULL values coalesce to empty strings for comparison.
Response:
{ "coordinatorRequestId": "550e8400-e29b-41d4-a716-446655440000"}List Requests
Section titled “List Requests”# All requestscurl http://127.0.0.1:8787/v1/override-requests
# Filter by statuscurl http://127.0.0.1:8787/v1/override-requests?status=PENDINGStatus values: PENDING, APPROVED, DENIED, EXPIRED, REDEEMED.
Approve Request
Section titled “Approve Request”curl -X POST http://127.0.0.1:8787/v1/override-requests/:id/approve \ -H "Content-Type: application/json" \ -d '{ "keyId": "operator-1", "operatorNote": "Authorized after manual review", "tokenTtlMs": 300000 }'| Field | Required | Description |
|---|---|---|
keyId | Yes | Authority key ID for signing |
operatorNote | No | Free-text justification |
tokenTtlMs | No | Token TTL in ms (defaults to defaultTokenTtlMs, capped at maxTokenTtlMs) |
The coordinator signs the token and returns a ready-to-use OverrideTokenEnvelope.
Deny Request
Section titled “Deny Request”curl -X POST http://127.0.0.1:8787/v1/override-requests/:id/deny \ -H "Content-Type: application/json" \ -d '{ "keyId": "operator-1", "operatorNote": "Insufficient justification" }'Redeem Token
Section titled “Redeem Token”curl -X POST http://127.0.0.1:8787/v1/override-tokens/redeem \ -H "Content-Type: application/json" \ -d '{ "tokenId": "550e8400-e29b-41d4-a716-446655440000", "requestHash": "1c712b22...", "policyVersion": 1, "licenseId": "lic_test_001", "actorId": "agent-1" }'Response statuses:
| Status | Meaning |
|---|---|
ACCEPTED | Token redeemed successfully |
REPLAY_DETECTED | Token was already redeemed |
UNKNOWN_TOKEN | Token not issued by this coordinator |
BINDING_MISMATCH | Request/policy/license/actor mismatch |
EXPIRED | Token past expiry time |
Error Responses
Section titled “Error Responses”| HTTP Status | Cause |
|---|---|
| 400 | Validation failure (non-overrideable decision, missing fields, adaptive recommendation mismatch) |
| 404 | Request ID not found |
| 409 | Invalid status transition, deduplicated submission, deny cooldown active, or material change required |
| 500 | Database or signing error |
Docker Deployment
Section titled “Docker Deployment”Dockerfile
Section titled “Dockerfile”FROM rust:1.94.0-bookworm AS builderWORKDIR /buildCOPY . .RUN cargo build -p kairos-cli --profile release-native
FROM debian:bookworm-slimRUN apt-get update && apt-get install -y --no-install-recommends \ libsqlite3-0 ca-certificates curl && rm -rf /var/lib/apt/lists/*COPY --from=builder /build/target/release-native/kairos /usr/local/bin/kairosVOLUME ["/testbed", "/data"]ENTRYPOINT ["kairos"]CMD ["hitl", "serve", "--config", "/testbed/fixtures/coordinator/coordinator.toml"]Docker Compose
Section titled “Docker Compose”services: coordinator: build: context: ${KAIROS_ENGINE_DIR:-../../kairos-engine} dockerfile: ${PWD}/docker/Dockerfile.coordinator ports: - "8787:8787" volumes: - ..:/testbed:ro tmpfs: - /data healthcheck: test: ["CMD", "curl", "-sf", "http://localhost:8787/healthz"] interval: 5s timeout: 3s retries: 5 environment: - RUST_LOG=infoKey deployment details:
| Aspect | Configuration |
|---|---|
| Testbed mount | Read-only at /testbed — coordinator reads config, policy, and keys |
| Database | tmpfs at /data — ephemeral, reset on container restart |
| Health check | HTTP GET /healthz every 5 seconds |
| Logging | Controlled via RUST_LOG environment variable |
Starting
Section titled “Starting”# From kairos-testbed directoryKAIROS_ENGINE_DIR=../kairos-engine docker compose -f docker/docker-compose.yml up coordinatorOr using the convenience script:
./scripts/run-coordinator.shRuntime Integration
Section titled “Runtime Integration”To connect the Substrate runtime to a coordinator for token redemption:
kairos evaluate \ --config artifact.json \ --policy policy.json \ --license license.key \ --coordinator-url http://127.0.0.1:8787 \ --request request-with-token.jsonThe runtime uses a 5-second timeout for coordinator calls. If the coordinator is unreachable, the override fails with CoordinatorUnavailable and the original rejection is preserved (fail-closed).