Testbed

The kairos-testbed repository provides a self-contained environment for testing the full Substrate stack — Sentinel assessment, HITL override workflows, and coordinator operations.

Repository Structure

kairos-testbed/
├── docker/              # Coordinator Dockerfile and compose stack
├── fixtures/            # Scenarios, policies, keys, coordinator configs
├── models/              # SLM model bundle and management scripts
├── scripts/             # Setup, collection, and smoke test scripts
└── data/                # Generated collection output

Setup

One-shot setup generates all machine-specific artifacts:

./scripts/setup.sh

Step 1: Generate Test License

./scripts/generate-test-license.sh

Generates a machine-bound license at fixtures/licenses/license.key using the kairos-license crate’s test fixture generator.

Step 2: Sign HITL Policy

./scripts/generate-test-policy.sh

Takes the policy template (fixtures/policies/test-policy-hitl.template.json) and produces a signed policy at fixtures/policies/test-policy-hitl.json.

Step 3: Download Model Bundle

./models/download-model.sh

Downloads SmolLM2-1.7B-Instruct (GGUF Q4_K_M quantization) from HuggingFace:

Model weights: models/smollm2-1.7b-instruct-q4km/model.gguf
Tokenizer: models/smollm2-1.7b-instruct-q4km/tokenizer.json

Step 4: Verify Bundle

./models/verify-bundle.sh

Checks:

All required files exist (manifest.json, generation.json, tokenizer.json, model.gguf)
JSON files parse correctly
Prompt version matches sentinel.phase_b.v1

Fixtures

Calibration Artifact

fixtures/artifacts/test-artifact.json — defines scaling functions for the AI safety domain:

Proxy	Maps to	Scaling	Input range
`capabilityIndex`	$\lambda$	`log`	[1, 1000]
`alignmentScore`	$\gamma$	`linear`	[0, 100]
`autonomyLevel`	$\lambda$	`sigmoid`	secondary
`humanOversightFreq`	$\gamma$	`log`	secondary
`guardrailCoverage`	$\gamma$	`linear`	secondary

Deployment Policy

fixtures/policies/test-policy-hitl.template.json — policy with HITL authorities:

Gamma floor minimum: 0.15
Permitted modes: state_gate, state_plus_action_gate
Fail behavior: fail_closed
Two operator authorities (operator-1/alice, operator-2/bob)

Operator Keys

fixtures/keys/ — RSA key pairs for test operators:

File	Purpose
`operator-1.pem`	Private key for operator-1 (alice)
`operator-1-pub.pem`	Public key for operator-1
`operator-2.pem`	Private key for operator-2 (bob)
`operator-2-pub.pem`	Public key for operator-2

Coordinator Configs

File	Description
`fixtures/coordinator/coordinator.toml`	Docker-based paths (`/testbed/...`, `/data/...`)
`fixtures/coordinator/coordinator-native.toml`	Native paths (local filesystem)

Test Scenarios

fixtures/scenarios/testbed-scenarios.json — six curated scenarios:

Scenario	Seed	Purpose
`sentinel-nominal`	1207	Wide headroom, no warnings
`sentinel-elevated-drift`	1030	Near-floor drift probing
`sentinel-critical-breach`	1014	Floor-breach escalation
`sentinel-multi-actor`	2102	Three concurrent actors
`hitl-approve-flow`	1031	Deterministic approval path
`hitl-concurrent-requests`	2401	Four actors, overlapping escalations

Docker Coordinator

Starting

# From kairos-testbed directory
KAIROS_ENGINE_DIR=../kairos-engine docker compose -f docker/docker-compose.yml up coordinator

Or using the convenience script:

./scripts/run-coordinator.sh

Architecture

Aspect	Configuration
Base image	`rust:1.94.0-bookworm` (builder), `debian:bookworm-slim` (runtime)
Port	8787
Testbed mount	Read-only at `/testbed`
Database	`tmpfs` at `/data` — ephemeral, reset on restart
Health check	`GET /healthz` every 5 seconds
Logging	`RUST_LOG=info`

The ephemeral database (tmpfs) means all override requests, tokens, and audit events are lost on container restart. This is intentional for testing.

Data Collection

Sentinel Collection

./scripts/collect-sentinel-data.sh [output-path]

Runs all testbed scenarios and collects Sentinel assessments from both backends (template and SLM). Output is NDJSON format:

{
  "scenario_id": "sentinel-elevated-drift",
  "actor_id": "default",
  "tick": 36,
  "risk_level": "Elevated",
  "template": {
    "narrative": "Human escalation was triggered...",
    "latency_ms": 0,
    "backend": "template"
  },
  "slm": {
    "narrative": "Gamma is below the configured floor...",
    "latency_ms": 12086,
    "backend": "slm"
  }
}

Environment variables:

Variable	Default	Description
`KAIROS_ENGINE_DIR`	`../kairos-engine`	Path to kairos-engine
`KAIROS_SENTINEL_MODEL_DIR`	`models/smollm2-1.7b-instruct-q4km`	Path to model bundle
`KAIROS_SENTINEL_TIMEOUT_MS`	`20000`	Inference timeout (ms)
`KAIROS_SENTINEL_MAX_OUTPUT_TOKENS`	`64`	Maximum output tokens

Sentinel Benchmarks

./scripts/benchmark-sentinel.sh

Runs the Sentinel crate’s criterion benchmarks with the SLM feature enabled.

HITL Smoke Test

./scripts/smoke-hitl.sh

Validates the full HITL override flow end-to-end:

Health check — verify coordinator is running
Submit — post an override request
Verify — check request status is PENDING
Approve — sign and issue a token via operator-1
Redeem — submit token for single-use redemption → ACCEPTED
Replay guard — submit same token again → REPLAY_DETECTED

TUI Demo

./scripts/demo-observe-hitl.sh \
  --scenario sentinel-critical-breach \
  --coordinator-url http://127.0.0.1:8787 \
  --operator-key-id operator-1 \
  --seed-requests 3

This script:

Starts the coordinator (or uses an existing one)
Seeds the coordinator with pending requests from a Sentinel collection
Launches kairos observe with --hitl and --sentinel flags
Provides a fully interactive operator dashboard for testing