Experimentation
Shipping a winner
Once the stats say Ship it and you've made the business decision to move forward, you need to promote the winning variant's config to production without hardcoding it into your agent. This page walks through the complete promotion flow.
Step 1 — Complete the experiment
Before promoting a winner, mark the experiment as completed. This:
- Locks the experiment — no new sessions will be counted.
- Preserves the final statistical results for future reference.
- Signals to your team that a decision has been made.
# Mark the experiment completed via the Niitaka dashboard:
# Experiments → open experiment → "Complete experiment" button
#
# Or via the API:
import requests
requests.patch(
f"https://api.niitaka.ai/experiments/{experiment_id}",
json={"status": "completed"},
headers={"X-API-Key": NIITAKA_API_KEY},
)Step 2 — Promote via Runtime Config
The cleanest way to ship a winner is through Runtime Config. Instead of hardcoding the model or prompt in your agent code, the SDK fetches the active config at session start. Updating the config in the dashboard takes effect on the next session — no deployment required.
Experiment completed
Winner identified
Update Runtime Config
Dashboard or API
Agent fetches new config
On next session start
Production traffic updated
No deployment needed
Before and after
Without Runtime Config, promoting a winner requires a code change and deployment:
# Before: agent picks model from its own code
with niitaka.start_session(goal="...", agent_id="report-summariser") as session:
response = openai_client.chat.completions.create(
model="gpt-4o", # hardcoded
messages=[...],
)With Runtime Config, you update the dashboard and the change is live immediately:
# After: agent reads promoted config automatically
with niitaka.start_session(goal="...", agent_id="report-summariser") as session:
config = niitaka.get_runtime_config(agent_id="report-summariser")
model = config["llm"]["model"] # ← comes from winning variant
system_prompt = config["llm"]["system_prompt"]
cost_limit = config["guardrails"]["cost_limit_usd"]
response = openai_client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": "..."},
],
)How to update the Runtime Config
- 1
Open the winning variant
Go to Experiments → [your experiment], find the winning variant in the Variants panel, and click the expand arrow to see its config.
- 2
Copy the config values
Note the model, system prompt, and guardrail values you want to promote.
- 3
Update the agent's active config
Go to Agents → [your agent] and update the Runtime Config fields with the winning values. Or use the
PATCH /agents/{agent_id}/configAPI. - 4
Verify
Run a test session and confirm
get_runtime_config()returns the new values.
Rollback
If the promoted config causes issues in production, rolling back is the same operation in reverse — update the Runtime Config to the previous values. Because the config is fetched at session start, the rollback takes effect on the next session with no deployment.
# To roll back: update the runtime config in the dashboard,
# or pin to a previous AgentVersion:
requests.patch(
f"https://api.niitaka.ai/agents/{agent_id}/config",
json={"llm": {"model": "gpt-4o"}}, # revert to previous model
headers={"Authorization": f"Bearer {JWT}"},
)Archiving the experiment
After shipping, archive the experiment to keep the Experiments list tidy. Archived experiments retain their full results and can be reopened for reference at any time.
- Go to Experiments → open the completed experiment → Archive.
- Archived experiments are hidden from the default list but accessible via the Archived filter.
What to run next
Experiments are most valuable when they compound. After shipping a winner:
- Iterate on the winner — the promoted config becomes your new baseline. Run a follow-up experiment testing the next improvement.
- Verify in production — monitor the live goal-completion rate on the Sessions dashboard for a week after promoting to confirm the improvement holds outside experiment conditions.
- Share the results — the completed experiment's Stats tab is a self-contained record. Link to it in your team's post-mortem or decision log.