LLM Adapter (AI Safety)
The LLM adapter maps LLM pipeline vocabulary — models, pipelines, sessions — onto the engine’s actor vocabulary without embedding new safety logic. It is a translation layer: discrete, auditable rules convert domain actions into physics moves that the action gating system can preview.
What It Does
Section titled “What It Does”An LLM runtime speaks in terms of models, inference pipelines, and user sessions. The KAIROS engine speaks in terms of actors, steps, and telemetry. The LLM adapter bridges this gap by owning a SubstrateSession and keeping all entity-to-actor mappings internal.
The adapter also registers a built-in ActionPhysicsMapper for the ai_safety action type, so that LLM pipeline actions can be previewed through the fly-by-wire action gate.
Architecture
Section titled “Architecture”The adapter sits between the LLM runtime and the engine core:
Caller (LLM runtime / router) | vLlmAdapter <- LLM vocabulary | vSubstrateSession <- actor vocabulary | vEngine <- domain-agnostic safety coreThe caller interacts only with LLM-native terms. The adapter translates these into session API calls. The engine never sees domain-specific concepts.
Mapping Modes
Section titled “Mapping Modes”The adapter supports three strategies for mapping LLM entities onto engine actors. The choice determines granularity — how many actors the engine tracks and at what level of abstraction.
| Mode | Mapping | Actor ID prefix | Use when |
|---|---|---|---|
PerModelActor | 1:1 model to actor | llm-model- | You want per-model safety tracking across all sessions |
PerPipelineActor | 1:1 pipeline to actor | llm-pipeline- | You want per-pipeline isolation (e.g., RAG vs. chat) |
PerSessionActor | 1:1 session to actor | llm-session- | You want per-user-session tracking with independent state |
In all modes, the adapter manages actor lifecycle automatically. Registering an entity creates the corresponding actor; unregistering removes it.
Action Mapping Rules
Section titled “Action Mapping Rules”The built-in LlmActionMapper translates five LLM action types into physics move directions. These rules are the core of the adapter — they define how the action gate interprets LLM pipeline behavior.
generate_completion
Section titled “generate_completion”Completion generation is mapped based on the safetyScore field in the action payload:
| Safety score | Direction | Interpretation |
|---|---|---|
| Left | High confidence — conservative, defensive posture | |
| to | Stay | Moderate confidence — hold current trajectory |
| Right | Low confidence — expansive, needs preview scrutiny |
A missing or unparseable safetyScore falls back to state-only gating. The action preview does not run, but the gamma/floor state gate still applies.
tool_call
Section titled “tool_call”Tool calls are mapped based on the target prefix:
| Target prefix | Direction | Interpretation |
|---|---|---|
read: | Configurable | Safe tool — defaults to Stay |
write: | Configurable | Safe tool — defaults to Stay |
external: | Right | External service call — expansive |
exec: | Right | Code execution — expansive |
Targets that do not match any recognized prefix fall back to state-only gating with an UnsupportedTarget reason. The prefix convention is strict in V1 — all tool targets must use one of the four prefixes above for action preview to engage.
route_to_model
Section titled “route_to_model”Model routing always maps to Stay. The routing decision itself does not change the system’s risk posture — it is the subsequent actions on the destination model that matter.
Retry behavior is mapped based on the retryDepth field in the payload:
| Retry depth | Direction | Interpretation |
|---|---|---|
| Stay | Normal retry — hold trajectory | |
| Right | Deep retry loop — expansive, may indicate runaway behavior |
When retryDepth is absent, it defaults to 1 (Stay).
escalate_to_human
Section titled “escalate_to_human”Human escalation always maps to Left — the most conservative direction. Escalating to a human operator is inherently defensive.
Configurable Safe-Tool Direction
Section titled “Configurable Safe-Tool Direction”The safe_tool_direction parameter controls the direction assigned to read: and write: tool calls. It defaults to Stay, meaning safe tools do not shift the system’s trajectory during preview.
For topologies where safe tools must actively move Left to remain in a safe corridor — such as the boundary-action-gate diagonal escalation field — set this to Left. This causes every safe tool call to pull the actor toward the conservative end of the lattice during preview.
Standard AI Safety Metrics
Section titled “Standard AI Safety Metrics”The adapter exports five standard metric keys for the AI Safety domain. These are the metrics that Rosetta uses for domain translation:
| Metric | Role |
|---|---|
capabilityIndex | Primary proxy — growth pressure |
alignmentScore | Primary proxy — structural stability |
autonomyLevel | Secondary proxy — enriches growth pressure |
humanOversightFreq | Secondary proxy — enriches structural stability |
guardrailCoverage | Secondary proxy — enriches structural stability |
See Secondary Proxies and Composite Aggregation for how multiple proxies combine into the final and values.
Integration Considerations
Section titled “Integration Considerations”Basin-collapse detection requires promote_sole_entity
Section titled “Basin-collapse detection requires promote_sole_entity”The engine’s shared loss system has a 10-tick cooldown. In the default multi-agent topology (a background “default” agent plus the registered entity), the first agent processed alphabetically can consume the cooldown and prevent the entity from detecting loss. Call promote_sole_entity() after registering an entity to remove all other agents — this is the only topology where basin-collapse fires reliably through the adapter.
RouteToModel requires destination metrics
Section titled “RouteToModel requires destination metrics”route_to_model always maps to Stay, so the action preview only has meaning if the metric snapshot already describes the destination model. If the snapshot still contains the source model’s capabilityIndex / alignmentScore, the route preview is silently evaluated against the wrong model profile.
PerSessionActor requires explicit cleanup
Section titled “PerSessionActor requires explicit cleanup”PerSessionActor does not expire actors automatically. When a user session ends, unregister the entity explicitly — otherwise both the adapter map entry and the underlying engine actor remain live.
Target prefix convention
Section titled “Target prefix convention”V1 target parsing is prefix-based. Only read:, write:, external:, and exec: prefixes are recognized. Any other target falls back to state-only gating, which means the gamma/floor state gate still runs but the action preview does not.