Testbed
The kairos-testbed repository provides a self-contained environment for testing the full Substrate stack — Sentinel assessment, HITL override workflows, and coordinator operations.
Repository Structure
Section titled “Repository Structure”kairos-testbed/├── docker/ # Coordinator Dockerfile and compose stack├── fixtures/ # Scenarios, policies, keys, coordinator configs├── models/ # SLM model bundle and management scripts├── scripts/ # Setup, collection, and smoke test scripts└── data/ # Generated collection outputOne-shot setup generates all machine-specific artifacts:
./scripts/setup.shStep 1: Generate Test License
Section titled “Step 1: Generate Test License”./scripts/generate-test-license.shGenerates a machine-bound license at fixtures/licenses/license.key using the kairos-license crate’s test fixture generator.
Step 2: Sign HITL Policy
Section titled “Step 2: Sign HITL Policy”./scripts/generate-test-policy.shTakes the policy template (fixtures/policies/test-policy-hitl.template.json) and produces a signed policy at fixtures/policies/test-policy-hitl.json.
Step 3: Download Model Bundle
Section titled “Step 3: Download Model Bundle”./models/download-model.shDownloads SmolLM2-1.7B-Instruct (GGUF Q4_K_M quantization) from HuggingFace:
- Model weights:
models/smollm2-1.7b-instruct-q4km/model.gguf - Tokenizer:
models/smollm2-1.7b-instruct-q4km/tokenizer.json
Step 4: Verify Bundle
Section titled “Step 4: Verify Bundle”./models/verify-bundle.shChecks:
- All required files exist (
manifest.json,generation.json,tokenizer.json,model.gguf) - JSON files parse correctly
- Prompt version matches
sentinel.phase_b.v1
Fixtures
Section titled “Fixtures”Calibration Artifact
Section titled “Calibration Artifact”fixtures/artifacts/test-artifact.json — defines scaling functions for the AI safety domain:
| Proxy | Maps to | Scaling | Input range |
|---|---|---|---|
capabilityIndex | log | [1, 1000] | |
alignmentScore | linear | [0, 100] | |
autonomyLevel | sigmoid | secondary | |
humanOversightFreq | log | secondary | |
guardrailCoverage | linear | secondary |
Deployment Policy
Section titled “Deployment Policy”fixtures/policies/test-policy-hitl.template.json — policy with HITL authorities:
- Gamma floor minimum: 0.15
- Permitted modes:
state_gate,state_plus_action_gate - Fail behavior:
fail_closed - Two operator authorities (
operator-1/alice,operator-2/bob)
Operator Keys
Section titled “Operator Keys”fixtures/keys/ — RSA key pairs for test operators:
| File | Purpose |
|---|---|
operator-1.pem | Private key for operator-1 (alice) |
operator-1-pub.pem | Public key for operator-1 |
operator-2.pem | Private key for operator-2 (bob) |
operator-2-pub.pem | Public key for operator-2 |
Coordinator Configs
Section titled “Coordinator Configs”| File | Description |
|---|---|
fixtures/coordinator/coordinator.toml | Docker-based paths (/testbed/..., /data/...) |
fixtures/coordinator/coordinator-native.toml | Native paths (local filesystem) |
Test Scenarios
Section titled “Test Scenarios”fixtures/scenarios/testbed-scenarios.json — six curated scenarios:
| Scenario | Seed | Purpose |
|---|---|---|
sentinel-nominal | 1207 | Wide headroom, no warnings |
sentinel-elevated-drift | 1030 | Near-floor drift probing |
sentinel-critical-breach | 1014 | Floor-breach escalation |
sentinel-multi-actor | 2102 | Three concurrent actors |
hitl-approve-flow | 1031 | Deterministic approval path |
hitl-concurrent-requests | 2401 | Four actors, overlapping escalations |
Docker Coordinator
Section titled “Docker Coordinator”Starting
Section titled “Starting”# From kairos-testbed directoryKAIROS_ENGINE_DIR=../kairos-engine docker compose -f docker/docker-compose.yml up coordinatorOr using the convenience script:
./scripts/run-coordinator.shArchitecture
Section titled “Architecture”| Aspect | Configuration |
|---|---|
| Base image | rust:1.94.0-bookworm (builder), debian:bookworm-slim (runtime) |
| Port | 8787 |
| Testbed mount | Read-only at /testbed |
| Database | tmpfs at /data — ephemeral, reset on restart |
| Health check | GET /healthz every 5 seconds |
| Logging | RUST_LOG=info |
The ephemeral database (tmpfs) means all override requests, tokens, and audit events are lost on container restart. This is intentional for testing.
Data Collection
Section titled “Data Collection”Sentinel Collection
Section titled “Sentinel Collection”./scripts/collect-sentinel-data.sh [output-path]Runs all testbed scenarios and collects Sentinel assessments from both backends (template and SLM). Output is NDJSON format:
{ "scenario_id": "sentinel-elevated-drift", "actor_id": "default", "tick": 36, "risk_level": "Elevated", "template": { "narrative": "Human escalation was triggered...", "latency_ms": 0, "backend": "template" }, "slm": { "narrative": "Gamma is below the configured floor...", "latency_ms": 12086, "backend": "slm" }}Environment variables:
| Variable | Default | Description |
|---|---|---|
KAIROS_ENGINE_DIR | ../kairos-engine | Path to kairos-engine |
KAIROS_SENTINEL_MODEL_DIR | models/smollm2-1.7b-instruct-q4km | Path to model bundle |
KAIROS_SENTINEL_TIMEOUT_MS | 20000 | Inference timeout (ms) |
KAIROS_SENTINEL_MAX_OUTPUT_TOKENS | 64 | Maximum output tokens |
Sentinel Benchmarks
Section titled “Sentinel Benchmarks”./scripts/benchmark-sentinel.shRuns the Sentinel crate’s criterion benchmarks with the SLM feature enabled.
HITL Smoke Test
Section titled “HITL Smoke Test”./scripts/smoke-hitl.shValidates the full HITL override flow end-to-end:
- Health check — verify coordinator is running
- Submit — post an override request
- Verify — check request status is
PENDING - Approve — sign and issue a token via operator-1
- Redeem — submit token for single-use redemption →
ACCEPTED - Replay guard — submit same token again →
REPLAY_DETECTED
TUI Demo
Section titled “TUI Demo”./scripts/demo-observe-hitl.sh \ --scenario sentinel-critical-breach \ --coordinator-url http://127.0.0.1:8787 \ --operator-key-id operator-1 \ --seed-requests 3This script:
- Starts the coordinator (or uses an existing one)
- Seeds the coordinator with pending requests from a Sentinel collection
- Launches
kairos observewith--hitland--sentinelflags - Provides a fully interactive operator dashboard for testing