Skip to content

Testbed

The kairos-testbed repository provides a self-contained environment for testing the full Substrate stack — Sentinel assessment, HITL override workflows, and coordinator operations.

kairos-testbed/
├── docker/ # Coordinator Dockerfile and compose stack
├── fixtures/ # Scenarios, policies, keys, coordinator configs
├── models/ # SLM model bundle and management scripts
├── scripts/ # Setup, collection, and smoke test scripts
└── data/ # Generated collection output

One-shot setup generates all machine-specific artifacts:

Terminal window
./scripts/setup.sh
Terminal window
./scripts/generate-test-license.sh

Generates a machine-bound license at fixtures/licenses/license.key using the kairos-license crate’s test fixture generator.

Terminal window
./scripts/generate-test-policy.sh

Takes the policy template (fixtures/policies/test-policy-hitl.template.json) and produces a signed policy at fixtures/policies/test-policy-hitl.json.

Terminal window
./models/download-model.sh

Downloads SmolLM2-1.7B-Instruct (GGUF Q4_K_M quantization) from HuggingFace:

  • Model weights: models/smollm2-1.7b-instruct-q4km/model.gguf
  • Tokenizer: models/smollm2-1.7b-instruct-q4km/tokenizer.json
Terminal window
./models/verify-bundle.sh

Checks:

  • All required files exist (manifest.json, generation.json, tokenizer.json, model.gguf)
  • JSON files parse correctly
  • Prompt version matches sentinel.phase_b.v1

fixtures/artifacts/test-artifact.json — defines scaling functions for the AI safety domain:

ProxyMaps toScalingInput range
capabilityIndexλ\lambdalog[1, 1000]
alignmentScoreγ\gammalinear[0, 100]
autonomyLevelλ\lambdasigmoidsecondary
humanOversightFreqγ\gammalogsecondary
guardrailCoverageγ\gammalinearsecondary

fixtures/policies/test-policy-hitl.template.json — policy with HITL authorities:

  • Gamma floor minimum: 0.15
  • Permitted modes: state_gate, state_plus_action_gate
  • Fail behavior: fail_closed
  • Two operator authorities (operator-1/alice, operator-2/bob)

fixtures/keys/ — RSA key pairs for test operators:

FilePurpose
operator-1.pemPrivate key for operator-1 (alice)
operator-1-pub.pemPublic key for operator-1
operator-2.pemPrivate key for operator-2 (bob)
operator-2-pub.pemPublic key for operator-2
FileDescription
fixtures/coordinator/coordinator.tomlDocker-based paths (/testbed/..., /data/...)
fixtures/coordinator/coordinator-native.tomlNative paths (local filesystem)

fixtures/scenarios/testbed-scenarios.json — six curated scenarios:

ScenarioSeedPurpose
sentinel-nominal1207Wide headroom, no warnings
sentinel-elevated-drift1030Near-floor drift probing
sentinel-critical-breach1014Floor-breach escalation
sentinel-multi-actor2102Three concurrent actors
hitl-approve-flow1031Deterministic approval path
hitl-concurrent-requests2401Four actors, overlapping escalations
Terminal window
# From kairos-testbed directory
KAIROS_ENGINE_DIR=../kairos-engine docker compose -f docker/docker-compose.yml up coordinator

Or using the convenience script:

Terminal window
./scripts/run-coordinator.sh
AspectConfiguration
Base imagerust:1.94.0-bookworm (builder), debian:bookworm-slim (runtime)
Port8787
Testbed mountRead-only at /testbed
Databasetmpfs at /data — ephemeral, reset on restart
Health checkGET /healthz every 5 seconds
LoggingRUST_LOG=info

The ephemeral database (tmpfs) means all override requests, tokens, and audit events are lost on container restart. This is intentional for testing.

Terminal window
./scripts/collect-sentinel-data.sh [output-path]

Runs all testbed scenarios and collects Sentinel assessments from both backends (template and SLM). Output is NDJSON format:

{
"scenario_id": "sentinel-elevated-drift",
"actor_id": "default",
"tick": 36,
"risk_level": "Elevated",
"template": {
"narrative": "Human escalation was triggered...",
"latency_ms": 0,
"backend": "template"
},
"slm": {
"narrative": "Gamma is below the configured floor...",
"latency_ms": 12086,
"backend": "slm"
}
}

Environment variables:

VariableDefaultDescription
KAIROS_ENGINE_DIR../kairos-enginePath to kairos-engine
KAIROS_SENTINEL_MODEL_DIRmodels/smollm2-1.7b-instruct-q4kmPath to model bundle
KAIROS_SENTINEL_TIMEOUT_MS20000Inference timeout (ms)
KAIROS_SENTINEL_MAX_OUTPUT_TOKENS64Maximum output tokens
Terminal window
./scripts/benchmark-sentinel.sh

Runs the Sentinel crate’s criterion benchmarks with the SLM feature enabled.

Terminal window
./scripts/smoke-hitl.sh

Validates the full HITL override flow end-to-end:

  1. Health check — verify coordinator is running
  2. Submit — post an override request
  3. Verify — check request status is PENDING
  4. Approve — sign and issue a token via operator-1
  5. Redeem — submit token for single-use redemption → ACCEPTED
  6. Replay guard — submit same token again → REPLAY_DETECTED
Terminal window
./scripts/demo-observe-hitl.sh \
--scenario sentinel-critical-breach \
--coordinator-url http://127.0.0.1:8787 \
--operator-key-id operator-1 \
--seed-requests 3

This script:

  1. Starts the coordinator (or uses an existing one)
  2. Seeds the coordinator with pending requests from a Sentinel collection
  3. Launches kairos observe with --hitl and --sentinel flags
  4. Provides a fully interactive operator dashboard for testing