Skip to content

LLM Adapter (AI Safety)

The LLM adapter maps LLM pipeline vocabulary — models, pipelines, sessions — onto the engine’s actor vocabulary without embedding new safety logic. It is a translation layer: discrete, auditable rules convert domain actions into physics moves that the action gating system can preview.

An LLM runtime speaks in terms of models, inference pipelines, and user sessions. The KAIROS engine speaks in terms of actors, steps, and telemetry. The LLM adapter bridges this gap by owning a SubstrateSession and keeping all entity-to-actor mappings internal.

The adapter also registers a built-in ActionPhysicsMapper for the ai_safety action type, so that LLM pipeline actions can be previewed through the fly-by-wire action gate.

The adapter sits between the LLM runtime and the engine core:

Caller (LLM runtime / router)
|
v
LlmAdapter <- LLM vocabulary
|
v
SubstrateSession <- actor vocabulary
|
v
Engine <- domain-agnostic safety core

The caller interacts only with LLM-native terms. The adapter translates these into session API calls. The engine never sees domain-specific concepts.

The adapter supports three strategies for mapping LLM entities onto engine actors. The choice determines granularity — how many actors the engine tracks and at what level of abstraction.

ModeMappingActor ID prefixUse when
PerModelActor1:1 model to actorllm-model-You want per-model safety tracking across all sessions
PerPipelineActor1:1 pipeline to actorllm-pipeline-You want per-pipeline isolation (e.g., RAG vs. chat)
PerSessionActor1:1 session to actorllm-session-You want per-user-session tracking with independent state

In all modes, the adapter manages actor lifecycle automatically. Registering an entity creates the corresponding actor; unregistering removes it.

The built-in LlmActionMapper translates five LLM action types into physics move directions. These rules are the core of the adapter — they define how the action gate interprets LLM pipeline behavior.

Completion generation is mapped based on the safetyScore field in the action payload:

Safety scoreDirectionInterpretation
0.7\geq 0.7LeftHigh confidence — conservative, defensive posture
0.30.3 to 0.70.7StayModerate confidence — hold current trajectory
<0.3< 0.3RightLow confidence — expansive, needs preview scrutiny

A missing or unparseable safetyScore falls back to state-only gating. The action preview does not run, but the gamma/floor state gate still applies.

Tool calls are mapped based on the target prefix:

Target prefixDirectionInterpretation
read:ConfigurableSafe tool — defaults to Stay
write:ConfigurableSafe tool — defaults to Stay
external:RightExternal service call — expansive
exec:RightCode execution — expansive

Targets that do not match any recognized prefix fall back to state-only gating with an UnsupportedTarget reason. The prefix convention is strict in V1 — all tool targets must use one of the four prefixes above for action preview to engage.

Model routing always maps to Stay. The routing decision itself does not change the system’s risk posture — it is the subsequent actions on the destination model that matter.

Retry behavior is mapped based on the retryDepth field in the payload:

Retry depthDirectionInterpretation
<3< 3StayNormal retry — hold trajectory
3\geq 3RightDeep retry loop — expansive, may indicate runaway behavior

When retryDepth is absent, it defaults to 1 (Stay).

Human escalation always maps to Left — the most conservative direction. Escalating to a human operator is inherently defensive.

The safe_tool_direction parameter controls the direction assigned to read: and write: tool calls. It defaults to Stay, meaning safe tools do not shift the system’s trajectory during preview.

For topologies where safe tools must actively move Left to remain in a safe corridor — such as the boundary-action-gate diagonal escalation field — set this to Left. This causes every safe tool call to pull the actor toward the conservative end of the lattice during preview.

The adapter exports five standard metric keys for the AI Safety domain. These are the metrics that Rosetta uses for domain translation:

MetricRole
capabilityIndexPrimary Λ\Lambda proxy — growth pressure
alignmentScorePrimary Γ\Gamma proxy — structural stability
autonomyLevelSecondary Λ\Lambda proxy — enriches growth pressure
humanOversightFreqSecondary Γ\Gamma proxy — enriches structural stability
guardrailCoverageSecondary Γ\Gamma proxy — enriches structural stability

See Secondary Proxies and Composite Aggregation for how multiple proxies combine into the final Λ\Lambda and Γ\Gamma values.

Basin-collapse detection requires promote_sole_entity

Section titled “Basin-collapse detection requires promote_sole_entity”

The engine’s shared loss system has a 10-tick cooldown. In the default multi-agent topology (a background “default” agent plus the registered entity), the first agent processed alphabetically can consume the cooldown and prevent the entity from detecting loss. Call promote_sole_entity() after registering an entity to remove all other agents — this is the only topology where basin-collapse fires reliably through the adapter.

route_to_model always maps to Stay, so the action preview only has meaning if the metric snapshot already describes the destination model. If the snapshot still contains the source model’s capabilityIndex / alignmentScore, the route preview is silently evaluated against the wrong model profile.

PerSessionActor does not expire actors automatically. When a user session ends, unregister the entity explicitly — otherwise both the adapter map entry and the underlying engine actor remain live.

V1 target parsing is prefix-based. Only read:, write:, external:, and exec: prefixes are recognized. Any other target falls back to state-only gating, which means the gamma/floor state gate still runs but the action preview does not.