Trace Analysis & Evaluation
When client.run(...) completes, it returns a Trace object (or a LazyTrace if streamed to disk). This object is not just a JSON container; it is a full analytical query engine used to assess the outcome of an experiment.
The Trace Object
Section titled “The Trace Object”A trace holds the authoritative timeline of all ticks, events, and engine states that occurred during the run.
trace = client.run(scenario, ticks=1000)
print(trace.metadata.engine_version) # Server's engine version (e.g. "1.4.4")print(trace.metadata.seed) # The deterministic seed usedGlobal Stability Metrics
Section titled “Global Stability Metrics”Interrogate the generalized state of the system across the entire run:
# The final phase at tick 1000 ("resonant", "volatile", or "stagnant")final_conclusion = trace.final_phase()
mean_s = trace.mean_stability()min_s = trace.min_stability()
print(f"The simulation dropped as low as {min_s} stability before recovering.")Forensic Event Analysis
Section titled “Forensic Event Analysis”If a model undergoes a catastrophic alignment failure, the Trace tracks exactly when the simulation crossed boundaries.
# Phase Transitions: Track when the system shifts between statesfor pt in trace.phase_transitions(): print(f"Shifted from {pt.from_phase} to {pt.to_phase} at tick {pt.tick}")
# Basin Losses: The ultimate failure state where Lambda collapses under Gammalosses = trace.basin_losses()if losses: print(f"Irreversible failure triggered at tick {losses[0].tick}")
# Ghost Branches: Timelines that were pruned/averted by the engineaverted = trace.ghost_branches()Agent-Specific Demultiplexing
Section titled “Agent-Specific Demultiplexing”To analyze a specific model interacting within a multi-model environment, extract an AgentTrace:
# Assume Model_A and Model_B exist in the scenarioagents = trace.agent_ids()
model_b_trace = trace.agent_trace("Model_B")print(model_b_trace.state_at(tick=400)) # Inspect exact thermodynamic loadPlotting & Notebook Integration
Section titled “Plotting & Notebook Integration”If you installed the SDK with the plotting extra (pip install "kairos-sdk[plotting]"), the Trace object includes matplotlib helpers for instant visualization in Jupyter.
import matplotlib.pyplot as plt
# Plots the global stability S over time, automatically annotating Basin Lossestrace.plot_stability()plt.show()
# Export a Pandas DataFrame for custom Seaborn plotting or CSV exportdf = trace.to_dataframe()df.to_csv("experiment_results.csv")Scenario Comparison Pattern
Section titled “Scenario Comparison Pattern”A common research pattern is running the same model with and without a specific intervention, then comparing the results side by side. This isolates the causal effect of a single variable.
from kairos import KairosClientfrom kairos.domains.ai_safety import AISafetyScenario, AISafetyEventType
client = KairosClient()
# Baseline: model with no guardrailsbaseline = ( AISafetyScenario("Baseline (no guardrails)", seed=42) .add_model(name="model", capability_index=600, alignment_score=65, guardrail_coverage=30) .add_event(200, AISafetyEventType.CAPABILITY_JUMP, target="model", magnitude=0.4))
# Treatment: same model with an oversight boardtreatment = ( AISafetyScenario("With Oversight", seed=42) .add_model(name="model", capability_index=600, alignment_score=65, guardrail_coverage=30) .add_oversight_body(name="board", guardrail_strength=80, response_latency=15) .add_event(200, AISafetyEventType.CAPABILITY_JUMP, target="model", magnitude=0.4))
trace_base = client.run(baseline, ticks=500)trace_treat = client.run(treatment, ticks=500)
print(f"{'Metric':<22} {'Baseline':>10} {'With Oversight':>16}")print("-" * 50)print(f"{'Final phase':<22} {trace_base.final_phase():>10} {trace_treat.final_phase():>16}")print(f"{'Mean stability':<22} {trace_base.mean_stability():>10.4f} {trace_treat.mean_stability():>16.4f}")print(f"{'Min stability':<22} {trace_base.min_stability():>10.4f} {trace_treat.min_stability():>16.4f}")print(f"{'Basin losses':<22} {len(trace_base.basin_losses()):>10} {len(trace_treat.basin_losses()):>16}")print(f"{'Phase transitions':<22} {len(trace_base.phase_transitions()):>10} {len(trace_treat.phase_transitions()):>16}")By using the same seed and identical model parameters, the only difference between the two traces is the oversight body. Any divergence in outcomes is directly attributable to the governance intervention.
Custom DataFrame Analysis
Section titled “Custom DataFrame Analysis”The to_dataframe() export gives you a tick-by-tick Pandas DataFrame with columns for stability, phase, Lambda, Gamma, and any per-agent metrics. This is the starting point for custom analysis beyond what the built-in methods provide.
import pandas as pd
df = trace.to_dataframe()
# Filter to only the volatile phasevolatile = df[df["phase"] == "volatile"]print(f"Time spent in volatile phase: {len(volatile)} ticks")print(f"Mean stability during volatile phase: {volatile['stability'].mean():.4f}")
# Compute a rolling average to smooth out noisedf["stability_rolling"] = df["stability"].rolling(window=20).mean()
# Find the tick with the steepest stability dropdf["stability_delta"] = df["stability"].diff()worst_tick = df["stability_delta"].idxmin()print(f"Steepest stability drop at tick {worst_tick}: " f"{df.loc[worst_tick, 'stability_delta']:.4f}")
# Group by phase and summarizephase_summary = df.groupby("phase").agg( ticks=("stability", "count"), mean_stability=("stability", "mean"), min_stability=("stability", "min"),).round(4)print(phase_summary)For more advanced analysis patterns, including parameter sweeps and multi-scenario comparison DataFrames, see the Use Cases & Cookbook.