Skip to content

Trace Analysis & Evaluation

When client.run(...) completes, it returns a Trace object (or a LazyTrace if streamed to disk). This object is not just a JSON container; it is a full analytical query engine used to assess the outcome of an experiment.

A trace holds the authoritative timeline of all ticks, events, and engine states that occurred during the run.

trace = client.run(scenario, ticks=1000)
print(trace.metadata.engine_version) # Server's engine version (e.g. "1.4.4")
print(trace.metadata.seed) # The deterministic seed used

Interrogate the generalized state of the system across the entire run:

# The final phase at tick 1000 ("resonant", "volatile", or "stagnant")
final_conclusion = trace.final_phase()
mean_s = trace.mean_stability()
min_s = trace.min_stability()
print(f"The simulation dropped as low as {min_s} stability before recovering.")

If a model undergoes a catastrophic alignment failure, the Trace tracks exactly when the simulation crossed boundaries.

# Phase Transitions: Track when the system shifts between states
for pt in trace.phase_transitions():
print(f"Shifted from {pt.from_phase} to {pt.to_phase} at tick {pt.tick}")
# Basin Losses: The ultimate failure state where Lambda collapses under Gamma
losses = trace.basin_losses()
if losses:
print(f"Irreversible failure triggered at tick {losses[0].tick}")
# Ghost Branches: Timelines that were pruned/averted by the engine
averted = trace.ghost_branches()

To analyze a specific model interacting within a multi-model environment, extract an AgentTrace:

# Assume Model_A and Model_B exist in the scenario
agents = trace.agent_ids()
model_b_trace = trace.agent_trace("Model_B")
print(model_b_trace.state_at(tick=400)) # Inspect exact thermodynamic load

If you installed the SDK with the plotting extra (pip install "kairos-sdk[plotting]"), the Trace object includes matplotlib helpers for instant visualization in Jupyter.

import matplotlib.pyplot as plt
# Plots the global stability S over time, automatically annotating Basin Losses
trace.plot_stability()
plt.show()
# Export a Pandas DataFrame for custom Seaborn plotting or CSV export
df = trace.to_dataframe()
df.to_csv("experiment_results.csv")

A common research pattern is running the same model with and without a specific intervention, then comparing the results side by side. This isolates the causal effect of a single variable.

from kairos import KairosClient
from kairos.domains.ai_safety import AISafetyScenario, AISafetyEventType
client = KairosClient()
# Baseline: model with no guardrails
baseline = (
AISafetyScenario("Baseline (no guardrails)", seed=42)
.add_model(name="model", capability_index=600, alignment_score=65, guardrail_coverage=30)
.add_event(200, AISafetyEventType.CAPABILITY_JUMP, target="model", magnitude=0.4)
)
# Treatment: same model with an oversight board
treatment = (
AISafetyScenario("With Oversight", seed=42)
.add_model(name="model", capability_index=600, alignment_score=65, guardrail_coverage=30)
.add_oversight_body(name="board", guardrail_strength=80, response_latency=15)
.add_event(200, AISafetyEventType.CAPABILITY_JUMP, target="model", magnitude=0.4)
)
trace_base = client.run(baseline, ticks=500)
trace_treat = client.run(treatment, ticks=500)
print(f"{'Metric':<22} {'Baseline':>10} {'With Oversight':>16}")
print("-" * 50)
print(f"{'Final phase':<22} {trace_base.final_phase():>10} {trace_treat.final_phase():>16}")
print(f"{'Mean stability':<22} {trace_base.mean_stability():>10.4f} {trace_treat.mean_stability():>16.4f}")
print(f"{'Min stability':<22} {trace_base.min_stability():>10.4f} {trace_treat.min_stability():>16.4f}")
print(f"{'Basin losses':<22} {len(trace_base.basin_losses()):>10} {len(trace_treat.basin_losses()):>16}")
print(f"{'Phase transitions':<22} {len(trace_base.phase_transitions()):>10} {len(trace_treat.phase_transitions()):>16}")

By using the same seed and identical model parameters, the only difference between the two traces is the oversight body. Any divergence in outcomes is directly attributable to the governance intervention.

The to_dataframe() export gives you a tick-by-tick Pandas DataFrame with columns for stability, phase, Lambda, Gamma, and any per-agent metrics. This is the starting point for custom analysis beyond what the built-in methods provide.

import pandas as pd
df = trace.to_dataframe()
# Filter to only the volatile phase
volatile = df[df["phase"] == "volatile"]
print(f"Time spent in volatile phase: {len(volatile)} ticks")
print(f"Mean stability during volatile phase: {volatile['stability'].mean():.4f}")
# Compute a rolling average to smooth out noise
df["stability_rolling"] = df["stability"].rolling(window=20).mean()
# Find the tick with the steepest stability drop
df["stability_delta"] = df["stability"].diff()
worst_tick = df["stability_delta"].idxmin()
print(f"Steepest stability drop at tick {worst_tick}: "
f"{df.loc[worst_tick, 'stability_delta']:.4f}")
# Group by phase and summarize
phase_summary = df.groupby("phase").agg(
ticks=("stability", "count"),
mean_stability=("stability", "mean"),
min_stability=("stability", "min"),
).round(4)
print(phase_summary)

For more advanced analysis patterns, including parameter sweeps and multi-scenario comparison DataFrames, see the Use Cases & Cookbook.