fix(captain): cap max_turns at 15 + restore scenario->orchestrator handoff
Runaway incident: Daniela (reservation scenario) entered a tool-calling loop, invoking faq_lookup with the same query dozens of times per second, stuck at 'Performing' in Sidekiq for minutes with 1-of-12 busy. Root cause was two interacting factors: 1. The previous commit removed scenario_agent.register_handoffs( assistant_agent) to prevent ping-pong. In practice, the scenario LLM uses handoff_to_orchestrator as a safety valve when it cannot advance. Without it, the LLM kept calling other available tools (faq_lookup) indefinitely. 2. max_turns was 100. A runaway loop could burn 100 LLM + tool cycles before Sidekiq's timeout fired, which meant real token spend in a single bad turn could blow a day's budget. Both restored/fixed: - max_turns: 100 -> 15. Plenty for normal flows; hard ceiling on any runaway. The LLM simply ran out of turns and had to emit a final response instead of looping further. - scenario -> orchestrator handoff: re-registered. Ping-pong risk is contained by max_turns AND by explicit prompt rules in the scenario instruction forbidding gratuitous handoffs (added to Daniela prompt in earlier commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
e5d186c689
commit
7bc5103541
@ -47,7 +47,11 @@ class Captain::Assistant::AgentRunnerService
|
||||
runner = add_usage_metadata_callback(runner)
|
||||
runner = add_callbacks_to_runner(runner) if @callbacks.any?
|
||||
install_instrumentation(runner)
|
||||
result = runner.run(message_to_process, context: context, max_turns: 100)
|
||||
# max_turns is the hard safety cap: each "turn" = one LLM call + optional tool calls.
|
||||
# 100 allowed runaway loops (LLM calling faq_lookup indefinitely when confused).
|
||||
# 15 is plenty for normal flows (greeting -> handoff -> coleta -> tool calls -> resposta)
|
||||
# while keeping a burn-budget ceiling per message.
|
||||
result = runner.run(message_to_process, context: context, max_turns: 15)
|
||||
|
||||
process_agent_result(result, original_query: message_to_process)
|
||||
rescue StandardError => e
|
||||
@ -373,14 +377,17 @@ class Captain::Assistant::AgentRunnerService
|
||||
assistant_agent = build_orchestrator_agent_with_memory
|
||||
scenario_agents = @assistant.scenarios.enabled.map(&:agent)
|
||||
|
||||
# Orchestrator can hand off INTO any scenario. Scenarios do NOT hand off
|
||||
# back to the orchestrator — that creates a ping-pong where the scenario
|
||||
# calls handoff_to_jasmine mid-flow, the orchestrator resumes the turn,
|
||||
# and responses get duplicated or routed through the FAQ guardrail. When
|
||||
# a customer changes topic mid-scenario, pick_starting_agent on the next
|
||||
# turn already routes back to the orchestrator based on conversation
|
||||
# state — no manual handoff needed from the scenario side.
|
||||
# Bidirectional handoff: orchestrator -> scenarios AND scenarios -> orchestrator.
|
||||
# Historical note: removing the back-edge looks attractive (prevents ping-pong)
|
||||
# but in practice the scenario LLM uses handoff_to_orchestrator as a "fallback"
|
||||
# when it gets confused. Without that fallback, the LLM keeps calling other
|
||||
# available tools (faq_lookup, etc.) in a loop — observed real-world incident
|
||||
# where Daniela called faq_lookup dozens of times in a runaway. Keep the edge.
|
||||
# Ping-pong is instead contained by max_turns in generate_response AND by
|
||||
# explicit prompt rules in the scenario instruction forbidding gratuitous
|
||||
# handoffs.
|
||||
assistant_agent.register_handoffs(*scenario_agents) if scenario_agents.any?
|
||||
scenario_agents.each { |scenario_agent| scenario_agent.register_handoffs(assistant_agent) }
|
||||
|
||||
[assistant_agent] + scenario_agents
|
||||
end
|
||||
|
||||
Loading…
Reference in New Issue
Block a user