Control
Policy Evaluation
Every event your agent logs is evaluated synchronously by the Niitaka policy engine before the response is returned to your code. This page explains exactly how that evaluation works — from event ingestion to the final decision and signal emission.
Evaluation flow
When an event arrives, the engine runs through a fixed sequence of stages. Each stage either produces a candidate action or passes control to the next.
Event received
llm / tool / decision / error
Cost limit check
Compare total_cost vs thresholds
Step limit check
Compare step_count vs thresholds
Retry evaluation
type=error + retries remaining?
Fallback evaluation
type=error + retries exhausted?
Decision + Signal
Persist trace, emit signal
Evaluation stages
1. Cost limit check
The engine reads the session's total_cost (accumulated from all prior events) and compares it against all cost_limit policies for the agent, sorted by priority descending. The first matching policy becomes a candidate action.
- If
action.type = "warn"— emit aguardrail / cost_limitsignal and continue. - If
action.type = "abort"— emit the signal and halt the session.
2. Step limit check
Same logic as cost_limit, applied to the session's step count. Both checks run independently; it is possible for both to match the same event (e.g. cost threshold hit on the 50th step).
3. Retry evaluation
If the event has type = "error" and a retry policy exists for the agent, the engine checks whether retries remain for this session:
- If retries remain — schedule a retry with the configured back-off, emit
control / retry. - If max retries exhausted — emit
control / retry_exhaustedand move to fallback evaluation.
4. Fallback evaluation
If retries are exhausted (or there is no retry policy) and a fallback policy exists, the engine swaps the active model to action.fallback_model for the remainder of the session and emits control / fallback.
5. General policy match
Any policy that matched in the above stages also triggers a policy / policy_triggered signal recording which rule fired and why. This is the audit record in the decision trace.
Decision trace
For every evaluated event the engine persists a DecisionTrace — a structured record of the evaluation result. You can view it in the session drawer under the Timeline tab.
How the winning action is chosen
When multiple policies match the same event, priority determines the winner. A higher numeric priority always wins. If two policies share the same priority, the one with the more severe action wins (abort > warn > retry > fallback).
abort action from a lower-priority policy can never override a higher-priority warn. Set your abort threshold at a strictly higher priority than the corresponding warn to ensure both fire correctly (warn first, abort at the harder limit).Guardrail chaining order
The four policy types are evaluated in this fixed order regardless of priority values:
Cost and step limits run first so that hard budget constraints always take precedence. Retry runs before fallback because the system attempts recovery before switching providers.
When no policy matches
If the event does not trigger any policy, the engine returns immediately with no action. No signal is emitted and no DecisionTrace is persisted. This is the common case for most events in a healthy session.
Spike detection (background)
Separately from per-event evaluation, a background worker periodically scans session data for anomalies. It emits alert / cost_spike or alert / error_spike signals when aggregate rates deviate significantly from the agent's recent baseline. These do not block execution — they are informational warnings surfaced in the Signals feed.
Policy Replay (Evaluate)
The Evaluate page lets you test a candidate policy list against your real historical decision traces — without touching live settings. It replays every stored DecisionTrace through the candidate policies and reports which decisions would have changed.
stop → ignore ×3) so you can see at a glance which decisions would flip and in which direction. Once you're happy with the results, you can promote the candidate directly to live policies — the change is recorded in the policy audit log with source=policy_replay.Next steps
- Policies — configure the rules that drive evaluation.
- Signals & Alerts — view and forward the signals the engine emits.
- Evaluate — test candidate policies against historical traces before going live (available in the sidebar under Control).