Skip to content

HITL Coordinator

The HITL coordinator is the control plane for override request management. It tracks pending requests, issues signed tokens, and enforces single-use redemption via an SQLite-backed REST API.

The coordinator runs as a standalone service (kairos hitl serve) with four core components:

ComponentRole
ServiceAxum REST API server
StoreSQLite backend with WAL journaling
SignerRSA-PSS token signing
ConfigTOML configuration loader with validation
Terminal window
kairos hitl serve --config coordinator.toml

The coordinator validates its configuration at startup:

  1. Loads and parses the deployment policy
  2. Verifies the policy has an hitl section
  3. Checks defaultTokenTtlMs <= maxTokenTtlMs
  4. For each authority: loads private key, derives public key, verifies it matches the policy
  5. Checks for duplicate key IDs

If any check fails, the coordinator exits with an error.

bind = "0.0.0.0:8787"
dbPath = "/data/hitl.sqlite"
pendingRequestTtlMs = 3600000
defaultTokenTtlMs = 300000
[policy]
path = "/path/to/policy.json"
[[authorities]]
keyId = "operator-1"
operatorId = "alice"
privateKeyPemPath = "/path/to/operator-1.pem"
[[authorities]]
keyId = "operator-2"
operatorId = "bob"
privateKeyPemPath = "/path/to/operator-2.pem"
FieldDescription
bindAddress and port to listen on
dbPathPath to SQLite database file (created if absent)
pendingRequestTtlMsHow long pending requests remain active before expiring (milliseconds)
defaultTokenTtlMsDefault token TTL when not specified in approval (milliseconds)
policy.pathPath to the signed deployment policy JSON

Each [[authorities]] entry maps a key ID to an operator identity and private key:

FieldDescription
keyIdMust match an authority entry in the deployment policy
operatorIdMust match the operatorId in the policy for this key
privateKeyPemPathPath to RSA private key in PKCS#8 or PKCS#1 PEM format

At startup, the coordinator derives the public key from each private key and verifies it matches the public key in the deployment policy. This prevents key mismatches that would produce valid-looking but unverifiable tokens.

The coordinator uses SQLite with WAL (Write-Ahead Logging) mode and a 5-second busy timeout. The schema is auto-migrated on startup.

override_requests — tracks the lifecycle of each override request:

ColumnTypeDescription
coordinator_request_idTEXT PKUUID v4 identifier
statusTEXTPENDING, APPROVED, DENIED, EXPIRED, REDEEMED
evaluation_requestTEXTOriginal evaluation request JSON
evaluation_responseTEXTOriginal evaluation response JSON
request_hashTEXTCanonical SHA-256 hash of the request
action_hashTEXTAction hash (if applicable)
license_idTEXTLicense ID from the submitter
actor_idTEXTTarget actor ID (nullable, resolved from inner fields)
intent_idTEXTAdaptive retry grouping key (nullable)
failure_fingerprintTEXTDeterministic failure hash for dedupe (nullable)
submitted_atTEXTSubmission timestamp (RFC 3339)
request_expires_atTEXTWhen this request expires
sentinel_feedTEXTSentinel telemetry feed JSON (optional)
sentinel_summaryTEXTSentinel summary JSON (optional)

issued_tokens — tracks signed tokens for single-use enforcement:

ColumnTypeDescription
token_idTEXT PKUUID v4 from the token payload
coordinator_request_idTEXT FKLinks to the parent request
payloadTEXTJSON payload string
signatureTEXTBase64url signature
issued_atTEXTToken creation timestamp
expires_atTEXTToken expiry timestamp
redeemed_atTEXTRedemption timestamp (NULL = not redeemed)

audit_events — immutable audit trail:

ColumnTypeDescription
idTEXT PKUUID v4
coordinator_request_idTEXT FKLinks to the parent request
event_typeTEXTSUBMITTED, APPROVED, DENIED, EXPIRED, REDEEMED
actor_idTEXTWho performed the action
timestampTEXTEvent timestamp
noteTEXTOptional operator note
PENDING ──┬──► APPROVED ──► REDEEMED
├──► DENIED
└──► EXPIRED (lazy, on read)

Expiration is lazy: when listing or fetching requests, the coordinator checks for pending requests past their request_expires_at and transitions them to EXPIRED with an audit event.

MethodPathDescription
POST/v1/override-requestsSubmit a new override request
GET/v1/override-requestsList requests (optional ?status= filter)
GET/v1/override-requests/:idGet a single request with details
POST/v1/override-requests/:id/approveApprove and issue a signed token
POST/v1/override-requests/:id/denyDeny the request
POST/v1/override-tokens/redeemRedeem a token (single-use check)
GET/healthzHealth check
Terminal window
curl -X POST http://127.0.0.1:8787/v1/override-requests \
-H "Content-Type: application/json" \
-d '{
"evaluationRequest": { ... },
"evaluationResponse": { ... },
"licenseId": "lic_test_001",
"actorId": "agent-1",
"sentinelFeed": { ... },
"sentinelSummary": { ... },
"source": "tui-v2"
}'

Validation:

  • Response decision must be REJECT_STATE or REJECT_ACTION
  • Escalation type must be HUMAN_ESCALATION
  • When adaptive evaluation detail is present, escalationRecommended must also be HUMAN_ESCALATION (rejects inconsistent payloads where top-level says HUMAN_ESCALATION but adaptive says REFORMULATE)
  • Request must not already contain an overrideToken
  • licenseId is required

Actor ID resolution: The coordinator resolves actorId from three sources in priority order: submission.actorIdevaluationRequest.actorIdevaluation.evaluatedActorId. The resolved value is persisted and used for all dedupe and cooldown lookups.

Adaptive gating (when adaptiveEscalation.enabled = true in the deployment policy):

GateConditionResponse
DedupePending request exists for same (actor, intent, fingerprint)409 with {"coordinatorRequestId": "...", "deduplicated": true}
Deny cooldownPrior denial within cooldownAfterDenyMs window409 Conflict
Material changeSame failureFingerprint as denied request when requireMaterialChangeAfterDeny is true409 Conflict

Dedupe uses null-safe COALESCE matching, so legacy submissions without intentId or failureFingerprint still participate — their NULL values coalesce to empty strings for comparison.

Response:

{
"coordinatorRequestId": "550e8400-e29b-41d4-a716-446655440000"
}
Terminal window
# All requests
curl http://127.0.0.1:8787/v1/override-requests
# Filter by status
curl http://127.0.0.1:8787/v1/override-requests?status=PENDING

Status values: PENDING, APPROVED, DENIED, EXPIRED, REDEEMED.

Terminal window
curl -X POST http://127.0.0.1:8787/v1/override-requests/:id/approve \
-H "Content-Type: application/json" \
-d '{
"keyId": "operator-1",
"operatorNote": "Authorized after manual review",
"tokenTtlMs": 300000
}'
FieldRequiredDescription
keyIdYesAuthority key ID for signing
operatorNoteNoFree-text justification
tokenTtlMsNoToken TTL in ms (defaults to defaultTokenTtlMs, capped at maxTokenTtlMs)

The coordinator signs the token and returns a ready-to-use OverrideTokenEnvelope.

Terminal window
curl -X POST http://127.0.0.1:8787/v1/override-requests/:id/deny \
-H "Content-Type: application/json" \
-d '{
"keyId": "operator-1",
"operatorNote": "Insufficient justification"
}'
Terminal window
curl -X POST http://127.0.0.1:8787/v1/override-tokens/redeem \
-H "Content-Type: application/json" \
-d '{
"tokenId": "550e8400-e29b-41d4-a716-446655440000",
"requestHash": "1c712b22...",
"policyVersion": 1,
"licenseId": "lic_test_001",
"actorId": "agent-1"
}'

Response statuses:

StatusMeaning
ACCEPTEDToken redeemed successfully
REPLAY_DETECTEDToken was already redeemed
UNKNOWN_TOKENToken not issued by this coordinator
BINDING_MISMATCHRequest/policy/license/actor mismatch
EXPIREDToken past expiry time
HTTP StatusCause
400Validation failure (non-overrideable decision, missing fields, adaptive recommendation mismatch)
404Request ID not found
409Invalid status transition, deduplicated submission, deny cooldown active, or material change required
500Database or signing error
FROM rust:1.94.0-bookworm AS builder
WORKDIR /build
COPY . .
RUN cargo build -p kairos-cli --profile release-native
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y --no-install-recommends \
libsqlite3-0 ca-certificates curl && rm -rf /var/lib/apt/lists/*
COPY --from=builder /build/target/release-native/kairos /usr/local/bin/kairos
VOLUME ["/testbed", "/data"]
ENTRYPOINT ["kairos"]
CMD ["hitl", "serve", "--config", "/testbed/fixtures/coordinator/coordinator.toml"]
services:
coordinator:
build:
context: ${KAIROS_ENGINE_DIR:-../../kairos-engine}
dockerfile: ${PWD}/docker/Dockerfile.coordinator
ports:
- "8787:8787"
volumes:
- ..:/testbed:ro
tmpfs:
- /data
healthcheck:
test: ["CMD", "curl", "-sf", "http://localhost:8787/healthz"]
interval: 5s
timeout: 3s
retries: 5
environment:
- RUST_LOG=info

Key deployment details:

AspectConfiguration
Testbed mountRead-only at /testbed — coordinator reads config, policy, and keys
Databasetmpfs at /data — ephemeral, reset on container restart
Health checkHTTP GET /healthz every 5 seconds
LoggingControlled via RUST_LOG environment variable
Terminal window
# From kairos-testbed directory
KAIROS_ENGINE_DIR=../kairos-engine docker compose -f docker/docker-compose.yml up coordinator

Or using the convenience script:

Terminal window
./scripts/run-coordinator.sh

To connect the Substrate runtime to a coordinator for token redemption:

Terminal window
kairos evaluate \
--config artifact.json \
--policy policy.json \
--license license.key \
--coordinator-url http://127.0.0.1:8787 \
--request request-with-token.json

The runtime uses a 5-second timeout for coordinator calls. If the coordinator is unreachable, the override fails with CoordinatorUnavailable and the original rejection is preserved (fail-closed).