First Evaluation

This walkthrough runs a single evaluation using the kairos evaluate command. You need a built CLI binary and an activated license.

Prepare the Inputs

An evaluation requires three files:

Calibration artifact — defines how domain metrics map to simulation parameters
Deployment policy — defines enforcement thresholds and mode
Evaluation request — the actual metrics snapshot to evaluate

Calibration Artifact (`artifact.json`)

{
  "schemaVersion": 1,
  "domain": "ai_safety",
  "adapterVersion": 1,
  "lambdaScaling": {
    "proxy": "capabilityIndex",
    "fn": "log",
    "inputRange": [1, 1000],
    "outputRange": [0.1, 0.95],
    "clamp": true
  },
  "gammaScaling": {
    "proxy": "alignmentScore",
    "fn": "linear",
    "inputRange": [0, 100],
    "outputRange": [0.05, 0.95],
    "clamp": true
  },
  "secondaryProxies": [],
  "composite": null,
  "calibratedAt": "2026-03-16T15:44:42.085Z",
  "corpusScore": null
}

This artifact maps capabilityIndex → $\lambda$ via a log function, and alignmentScore → $\gamma$ via a linear function.

Deployment Policy (`policy.json`)

{
  "schemaVersion": 1,
  "version": 1,
  "base": {
    "payload": {
      "gammaFloorMin": 0.15,
      "permittedModes": ["state_gate"],
      "metricStalenessMaxMs": 60000,
      "requireMetricSignature": false,
      "failBehavior": "fail_closed"
    },
    "signature": "<base64url-rsa-pss-signature>"
  },
  "overrides": {
    "gammaFloor": 0.2,
    "mode": "state_gate"
  }
}

This policy sets a gamma floor of 0.2 — any evaluation where $\gamma$ falls below 0.2 will produce a REJECT_STATE decision.

Evaluation Request (`request.json`)

{
  "envelopeVersion": 1,
  "requestId": "first-eval-001",
  "snapshot": {
    "timestamp": "2026-03-21T12:00:00.000Z",
    "metrics": {
      "capabilityIndex": 450.0,
      "alignmentScore": 72.0
    }
  }
}

Run the Evaluation

kairos evaluate \
  --config artifact.json \
  --policy policy.json \
  --request request.json

You can also pipe the request from stdin:

cat request.json | kairos evaluate \
  --config artifact.json \
  --policy policy.json

Interpret the Response

A passing evaluation returns:

{
  "envelopeVersion": 1,
  "requestId": "first-eval-001",
  "decision": "PASS",
  "reasonCode": "NONE",
  "mode": "state_gate",
  "policyVersion": 1,
  "adapterVersion": 1,
  "evaluation": {
    "currentGamma": 0.68,
    "gammaFloor": 0.20,
    "currentLambda": 0.89,
    "stability": 2.40,
    "predictedGamma": null
  },
  "escalation": null,
  "overrideOutcome": null,
  "timestamp": "2026-03-21T12:00:00.500Z"
}

Key fields:

Field	Meaning
`decision`	`PASS` — the action is permitted
`reasonCode`	`NONE` — no adverse condition detected
`evaluation.currentGamma`	The computed $\gamma$ value (0.68)
`evaluation.gammaFloor`	The policy threshold (0.20)
`evaluation.stability`	Normalized headroom: $(\gamma - \text{floor}) / \text{floor}$
`evaluation.currentLambda`	The computed $\lambda$ value (0.89)

If alignmentScore were low enough to push $\gamma$ below the floor (0.2), the response would show:

{
  "decision": "REJECT_STATE",
  "reasonCode": "GAMMA_BELOW_FLOOR",
  ...
}

Next Steps

Learn about Fly-by-Wire concepts for stateful evaluation with engine telemetry
Understand the calibration artifact schema to tune your metric mappings
Explore the full CLI reference for all available commands