LLM Adapter (AI Safety)

The LLM adapter maps LLM pipeline vocabulary — models, pipelines, sessions — onto the engine’s actor vocabulary without embedding new safety logic. It is a translation layer: discrete, auditable rules convert domain actions into physics moves that the action gating system can preview.

What It Does

An LLM runtime speaks in terms of models, inference pipelines, and user sessions. The KAIROS engine speaks in terms of actors, steps, and telemetry. The LLM adapter bridges this gap by owning a SubstrateSession and keeping all entity-to-actor mappings internal.

The adapter also registers a built-in ActionPhysicsMapper for the ai_safety action type, so that LLM pipeline actions can be previewed through the fly-by-wire action gate.

Architecture

The adapter sits between the LLM runtime and the engine core:

Caller (LLM runtime / router)
  |
  v
LlmAdapter          <- LLM vocabulary
  |
  v
SubstrateSession    <- actor vocabulary
  |
  v
Engine              <- domain-agnostic safety core

The caller interacts only with LLM-native terms. The adapter translates these into session API calls. The engine never sees domain-specific concepts.

Mapping Modes

The adapter supports three strategies for mapping LLM entities onto engine actors. The choice determines granularity — how many actors the engine tracks and at what level of abstraction.

Mode	Mapping	Actor ID prefix	Use when
`PerModelActor`	1:1 model to actor	`llm-model-`	You want per-model safety tracking across all sessions
`PerPipelineActor`	1:1 pipeline to actor	`llm-pipeline-`	You want per-pipeline isolation (e.g., RAG vs. chat)
`PerSessionActor`	1:1 session to actor	`llm-session-`	You want per-user-session tracking with independent state

In all modes, the adapter manages actor lifecycle automatically. Registering an entity creates the corresponding actor; unregistering removes it.

Action Mapping Rules

The built-in LlmActionMapper translates five LLM action types into physics move directions. These rules are the core of the adapter — they define how the action gate interprets LLM pipeline behavior.

generate_completion

Completion generation is mapped based on the safetyScore field in the action payload:

Safety score	Direction	Interpretation
$\geq 0.7$	Left	High confidence — conservative, defensive posture
$0.3$ to $0.7$	Stay	Moderate confidence — hold current trajectory
$< 0.3$	Right	Low confidence — expansive, needs preview scrutiny

A missing or unparseable safetyScore falls back to state-only gating. The action preview does not run, but the gamma/floor state gate still applies.

tool_call

Tool calls are mapped based on the target prefix:

Target prefix	Direction	Interpretation
`read:`	Configurable	Safe tool — defaults to Stay
`write:`	Configurable	Safe tool — defaults to Stay
`external:`	Right	External service call — expansive
`exec:`	Right	Code execution — expansive

Targets that do not match any recognized prefix fall back to state-only gating with an UnsupportedTarget reason. The prefix convention is strict in V1 — all tool targets must use one of the four prefixes above for action preview to engage.

route_to_model

Model routing always maps to Stay. The routing decision itself does not change the system’s risk posture — it is the subsequent actions on the destination model that matter.

retry

Retry behavior is mapped based on the retryDepth field in the payload:

Retry depth	Direction	Interpretation
$< 3$	Stay	Normal retry — hold trajectory
$\geq 3$	Right	Deep retry loop — expansive, may indicate runaway behavior

When retryDepth is absent, it defaults to 1 (Stay).

escalate_to_human

Human escalation always maps to Left — the most conservative direction. Escalating to a human operator is inherently defensive.

Configurable Safe-Tool Direction

The safe_tool_direction parameter controls the direction assigned to read: and write: tool calls. It defaults to Stay, meaning safe tools do not shift the system’s trajectory during preview.

For topologies where safe tools must actively move Left to remain in a safe corridor — such as the boundary-action-gate diagonal escalation field — set this to Left. This causes every safe tool call to pull the actor toward the conservative end of the lattice during preview.

Standard AI Safety Metrics

The adapter exports five standard metric keys for the AI Safety domain. These are the metrics that Rosetta uses for domain translation:

Metric	Role
`capabilityIndex`	Primary $\Lambda$ proxy — growth pressure
`alignmentScore`	Primary $\Gamma$ proxy — structural stability
`autonomyLevel`	Secondary $\Lambda$ proxy — enriches growth pressure
`humanOversightFreq`	Secondary $\Gamma$ proxy — enriches structural stability
`guardrailCoverage`	Secondary $\Gamma$ proxy — enriches structural stability

See Secondary Proxies and Composite Aggregation for how multiple proxies combine into the final $\Lambda$ and $\Gamma$ values.

Integration Considerations

Basin-collapse detection requires promote_sole_entity

The engine’s shared loss system has a 10-tick cooldown. In the default multi-agent topology (a background “default” agent plus the registered entity), the first agent processed alphabetically can consume the cooldown and prevent the entity from detecting loss. Call promote_sole_entity() after registering an entity to remove all other agents — this is the only topology where basin-collapse fires reliably through the adapter.

RouteToModel requires destination metrics

route_to_model always maps to Stay, so the action preview only has meaning if the metric snapshot already describes the destination model. If the snapshot still contains the source model’s capabilityIndex / alignmentScore, the route preview is silently evaluated against the wrong model profile.

PerSessionActor requires explicit cleanup

PerSessionActor does not expire actors automatically. When a user session ends, unregister the entity explicitly — otherwise both the adapter map entry and the underlying engine actor remain live.

Target prefix convention

V1 target parsing is prefix-based. Only read:, write:, external:, and exec: prefixes are recognized. Any other target falls back to state-only gating, which means the gamma/floor state gate still runs but the action preview does not.