Experimentation

Shipping a winner

Once the stats say Ship it and you've made the business decision to move forward, you need to promote the winning variant's config to production without hardcoding it into your agent. This page walks through the complete promotion flow.

Step 1 — Complete the experiment

Before promoting a winner, mark the experiment as completed. This:

Locks the experiment — no new sessions will be counted.
Preserves the final statistical results for future reference.
Signals to your team that a decision has been made.

python

# Mark the experiment completed via the Niitaka dashboard:
# Experiments → open experiment → "Complete experiment" button
#
# Or via the API:
import requests

requests.patch(
    f"https://api.niitaka.ai/experiments/{experiment_id}",
    json={"status": "completed"},
    headers={"X-API-Key": NIITAKA_API_KEY},
)

Note:Completing an experiment does not automatically change anything in production. Your agent continues to run with whatever config it had before. The next steps make the winning config live.

Step 2 — Promote via Runtime Config

The cleanest way to ship a winner is through Runtime Config. Instead of hardcoding the model or prompt in your agent code, the SDK fetches the active config at session start. Updating the config in the dashboard takes effect on the next session — no deployment required.

Experiment completed

Winner identified

↓

Update Runtime Config

Dashboard or API

↓

Agent fetches new config

On next session start

↓

Production traffic updated

No deployment needed

Before and after

Without Runtime Config, promoting a winner requires a code change and deployment:

python

# Before: agent picks model from its own code
with niitaka.start_session(goal="...", agent_id="report-summariser") as session:
    response = openai_client.chat.completions.create(
        model="gpt-4o",          # hardcoded
        messages=[...],
    )

With Runtime Config, you update the dashboard and the change is live immediately:

python

# After: agent reads promoted config automatically
with niitaka.start_session(goal="...", agent_id="report-summariser") as session:
    config = niitaka.get_runtime_config(agent_id="report-summariser")

    model         = config["llm"]["model"]           # ← comes from winning variant
    system_prompt = config["llm"]["system_prompt"]
    cost_limit    = config["guardrails"]["cost_limit_usd"]

    response = openai_client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user",   "content": "..."},
        ],
    )

How to update the Runtime Config

1
Open the winning variant
Go to Experiments → [your experiment], find the winning variant in the Variants panel, and click the expand arrow to see its config.
2
Copy the config values
Note the model, system prompt, and guardrail values you want to promote.
3
Update the agent's active config
Go to Agents → [your agent] and update the Runtime Config fields with the winning values. Or use the PATCH /agents/{agent_id}/config API.
4
Verify
Run a test session and confirm get_runtime_config() returns the new values.

Rollback

If the promoted config causes issues in production, rolling back is the same operation in reverse — update the Runtime Config to the previous values. Because the config is fetched at session start, the rollback takes effect on the next session with no deployment.

python

# To roll back: update the runtime config in the dashboard,
# or pin to a previous AgentVersion:
requests.patch(
    f"https://api.niitaka.ai/agents/{agent_id}/config",
    json={"llm": {"model": "gpt-4o"}},    # revert to previous model
    headers={"Authorization": f"Bearer {JWT}"},
)

Tip:Before promoting, note the current config values so you can revert quickly if needed. You can also pin a specific AgentVersion and roll back to it in one step from the Agents dashboard.

Archiving the experiment

After shipping, archive the experiment to keep the Experiments list tidy. Archived experiments retain their full results and can be reopened for reference at any time.

Go to Experiments → open the completed experiment → Archive.
Archived experiments are hidden from the default list but accessible via the Archived filter.

What to run next

Experiments are most valuable when they compound. After shipping a winner:

Iterate on the winner — the promoted config becomes your new baseline. Run a follow-up experiment testing the next improvement.
Verify in production — monitor the live goal-completion rate on the Sessions dashboard for a week after promoting to confirm the improvement holds outside experiment conditions.
Share the results — the completed experiment's Stats tab is a self-contained record. Link to it in your team's post-mortem or decision log.

Limitations & constraints

Guardrails & Policies

Was this page helpful?

Need help? Contact Support Questions? Contact Sales LLM? Read llms.txt