fix(captain-memory): guard memory recall from blocking agent worker

Real-world test triggered a Sidekiq worker hang on conv 67 after a
message was routed through Daniela: two ResponseBuilderJobs (msg 1318
and 1319) started, emitted typing_on, then never returned. Sidekiq
showed 2/12 workers stuck for 10+ minutes — indefinite.

Root cause likely: Agents::Runner evaluates the orchestrator
instructions lambda multiple times per turn, and our wrapped lambda
calls MemoryPromptInjector#append_memory_block each time. Inside,
RecallService invokes OpenAI embedding API (2s timeout) and pgvector.
Ruby's Timeout.timeout has documented holes on net/http syscalls — if
the embedding API stalls at the socket level, the worker hangs forever
even though the timeout "fired".

Two fixes:

1. Per-message cache in the injector instance: the same
   message_text is embedded + queried once, not N times per turn.
   Dramatic reduction in network calls + DB queries during a single
   agent run. Every call after the first returns the cached block
   instantly.

2. Absolute rescue at append_memory_block top level:
   rescue StandardError => e; return base_prompt. Even if the whole
   memory pipeline throws, the base system prompt passes through and
   the agent keeps responding. Memory is NEVER allowed to block a
   response — that was already the design intent but the lambda caller
   path didn't honor it rigorously enough.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Rodribm10 2026-04-19 09:06:35 -03:00
parent 6330bec857
commit bcf41ad15f

View File

@ -1,6 +1,7 @@
class Captain::Assistant::MemoryPromptInjector
def initialize(conversation:)
@conversation = conversation
@memory_block_cache = {}
end
def recall_enabled?
@ -14,24 +15,39 @@ class Captain::Assistant::MemoryPromptInjector
# Wraps the given base system prompt with a <memoria_cliente> block
# when recall is enabled and memories are found. Degrades gracefully:
# returns the untouched base prompt on any failure or absent context.
# Caches the memory block per-message-text within the injector instance so
# Agents::Runner evaluating instructions multiple times per turn does not
# re-hit EmbeddingService or pgvector on every call.
def append_memory_block(base_prompt, message_text)
return base_prompt unless recall_enabled?
return base_prompt if @conversation&.contact.blank?
memories = Captain::ContactMemories::RecallService.new(
contact: @conversation.contact,
query_text: message_text,
unit_id: resolve_unit_id
).call
block = memory_block_for(message_text)
return base_prompt if block.blank?
memory_block = Captain::ContactMemories::PromptInjectionService.new(memories: memories).call
return base_prompt if memory_block.blank?
[base_prompt, memory_block].join("\n\n")
[base_prompt, block].join("\n\n")
rescue StandardError => e
# Absolute guard: memory recall NEVER blocks or breaks the agent response.
Rails.logger.error("[Captain V2] MemoryPromptInjector unexpected failure: #{e.class}: #{e.message}")
base_prompt
end
private
def memory_block_for(message_text)
key = message_text.to_s
return @memory_block_cache[key] if @memory_block_cache.key?(key)
memories = Captain::ContactMemories::RecallService.new(
contact: @conversation.contact,
query_text: key,
unit_id: resolve_unit_id
).call
@memory_block_cache[key] =
Captain::ContactMemories::PromptInjectionService.new(memories: memories).call
end
def resolve_unit_id
return nil if @conversation.blank?