Skip to content

First Evaluation

This walkthrough runs a single evaluation using the kairos evaluate command. You need a built CLI binary and an activated license.

An evaluation requires three files:

  1. Calibration artifact — defines how domain metrics map to simulation parameters
  2. Deployment policy — defines enforcement thresholds and mode
  3. Evaluation request — the actual metrics snapshot to evaluate
{
"schemaVersion": 1,
"domain": "ai_safety",
"adapterVersion": 1,
"lambdaScaling": {
"proxy": "capabilityIndex",
"fn": "log",
"inputRange": [1, 1000],
"outputRange": [0.1, 0.95],
"clamp": true
},
"gammaScaling": {
"proxy": "alignmentScore",
"fn": "linear",
"inputRange": [0, 100],
"outputRange": [0.05, 0.95],
"clamp": true
},
"secondaryProxies": [],
"composite": null,
"calibratedAt": "2026-03-16T15:44:42.085Z",
"corpusScore": null
}

This artifact maps capabilityIndexλ\lambda via a log function, and alignmentScoreγ\gamma via a linear function.

{
"schemaVersion": 1,
"version": 1,
"base": {
"payload": {
"gammaFloorMin": 0.15,
"permittedModes": ["state_gate"],
"metricStalenessMaxMs": 60000,
"requireMetricSignature": false,
"failBehavior": "fail_closed"
},
"signature": "<base64url-rsa-pss-signature>"
},
"overrides": {
"gammaFloor": 0.2,
"mode": "state_gate"
}
}

This policy sets a gamma floor of 0.2 — any evaluation where γ\gamma falls below 0.2 will produce a REJECT_STATE decision.

{
"envelopeVersion": 1,
"requestId": "first-eval-001",
"snapshot": {
"timestamp": "2026-03-21T12:00:00.000Z",
"metrics": {
"capabilityIndex": 450.0,
"alignmentScore": 72.0
}
}
}
Terminal window
kairos evaluate \
--config artifact.json \
--policy policy.json \
--request request.json

You can also pipe the request from stdin:

Terminal window
cat request.json | kairos evaluate \
--config artifact.json \
--policy policy.json

A passing evaluation returns:

{
"envelopeVersion": 1,
"requestId": "first-eval-001",
"decision": "PASS",
"reasonCode": "NONE",
"mode": "state_gate",
"policyVersion": 1,
"adapterVersion": 1,
"evaluation": {
"currentGamma": 0.68,
"gammaFloor": 0.20,
"currentLambda": 0.89,
"stability": 2.40,
"predictedGamma": null
},
"escalation": null,
"overrideOutcome": null,
"timestamp": "2026-03-21T12:00:00.500Z"
}

Key fields:

FieldMeaning
decisionPASS — the action is permitted
reasonCodeNONE — no adverse condition detected
evaluation.currentGammaThe computed γ\gamma value (0.68)
evaluation.gammaFloorThe policy threshold (0.20)
evaluation.stabilityNormalized headroom: (γfloor)/floor(\gamma - \text{floor}) / \text{floor}
evaluation.currentLambdaThe computed λ\lambda value (0.89)

If alignmentScore were low enough to push γ\gamma below the floor (0.2), the response would show:

{
"decision": "REJECT_STATE",
"reasonCode": "GAMMA_BELOW_FLOOR",
...
}