Commit Graph

372 Commits

Author SHA1 Message Date
Rodribm10
e5d186c689 fix(captain): stop scenario->orchestrator handoff + narrow FAQ guardrail
Two behavioural regressions caught in live testing with a real customer
conversation:

1. Ping-pong scenario -> orchestrator -> scenario

   build_and_wire_agents was calling scenario_agents.register_handoffs(
   assistant_agent), which exposed handoff_to_jasmine as a tool INSIDE
   every scenario. Daniela (reservation scenario) kept calling it mid
   flow, the orchestrator resumed the turn, and customers got messages
   like "Vou te encaminhar para a Daniela..." after ALREADY being with
   Daniela. The back-edge is removed. When a customer legitimately
   changes topic mid-scenario, pick_starting_agent on the next turn
   already routes back to the orchestrator based on conversation state,
   so no manual handoff from the scenario side is needed.

2. FAQ_PRICE_PATTERNS was hijacking legitimate routing responses

   The previous regex matched the bare words "pernoite", "sinal",
   "diaria" WITHOUT requiring a numeric price nearby. A legitimate
   handoff response like "Vou transferir para a Daniela para confirmar
   a Stilo para pernoite" tripped the guardrail, which then substituted
   the response with raw FAQ content about rates. Narrowed to: R$
   values, numbers followed by "reais", and the explicit price-noun
   variants (preco/preço/valor/preços/valores/custo/custa). Incidental
   mentions of stay types no longer trigger.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 10:51:45 -03:00
Rodribm10
fa758e4848 feat(captain): hierarchical model routing + conversation-level memory cache
Two orthogonal cost optimizations to the Captain agent pipeline:

1. Hierarchical model routing (optimization A)

Captain::Scenario now overrides agent_model to read a dedicated
InstallationConfig CAPTAIN_OPEN_AI_MODEL_SCENARIO, falling back to the
global CAPTAIN_OPEN_AI_MODEL used by the orchestrator (Assistant).

Rationale: the orchestrator (Jasmine) does cheap triage (is this a
reservation intent? a greeting? escalate to human?) — a smaller model
handles this well. Scenarios (Daniela — reserva) run complex flows with
tool calling, strict taxonomies, and JSON schema output — they benefit
from a stronger model.

Config in this install: CAPTAIN_OPEN_AI_MODEL=gpt-4o-mini (orchestrator)
and CAPTAIN_OPEN_AI_MODEL_SCENARIO=gpt-4o (scenarios). Estimated ~60%
cost reduction vs everything on gpt-4o, preserving quality where it
matters for the business flow.

2. Conversation-level memory cache (optimization B)

MemoryPromptInjector now persists the computed memory block on
conversation.custom_attributes[captain_cached_memory_block]. First turn
computes once (embedding + pgvector query + XML formatting); subsequent
turns reuse. The customer's profile does not change during an open
conversation, so re-running the pipeline on every turn was pure waste.

Graceful fallbacks:
- Cache write failure → per-service-instance in-memory fallback still
  applies.
- Cache read failure → fresh recall runs (no regression).
- Contact mismatch → invalidates cache, fresh recall runs.

When a new conversation starts, custom_attributes is empty → fresh
recall populates the cache for that conversation's lifetime.

Estimated ~80% reduction in embedding + pgvector calls during
multi-turn conversations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 09:47:15 -03:00
Rodribm10
bcf41ad15f fix(captain-memory): guard memory recall from blocking agent worker
Real-world test triggered a Sidekiq worker hang on conv 67 after a
message was routed through Daniela: two ResponseBuilderJobs (msg 1318
and 1319) started, emitted typing_on, then never returned. Sidekiq
showed 2/12 workers stuck for 10+ minutes — indefinite.

Root cause likely: Agents::Runner evaluates the orchestrator
instructions lambda multiple times per turn, and our wrapped lambda
calls MemoryPromptInjector#append_memory_block each time. Inside,
RecallService invokes OpenAI embedding API (2s timeout) and pgvector.
Ruby's Timeout.timeout has documented holes on net/http syscalls — if
the embedding API stalls at the socket level, the worker hangs forever
even though the timeout "fired".

Two fixes:

1. Per-message cache in the injector instance: the same
   message_text is embedded + queried once, not N times per turn.
   Dramatic reduction in network calls + DB queries during a single
   agent run. Every call after the first returns the cached block
   instantly.

2. Absolute rescue at append_memory_block top level:
   rescue StandardError => e; return base_prompt. Even if the whole
   memory pipeline throws, the base system prompt passes through and
   the agent keeps responding. Memory is NEVER allowed to block a
   response — that was already the design intent but the lambda caller
   path didn't honor it rigorously enough.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 09:06:35 -03:00
Rodribm10
6330bec857 fix(captain-memory): temporal memory model + aggressive dedup
User feedback revealed a fundamental design issue: the memory model was
accumulating contradictory "Prefere X" facts because a single choice was
being treated as a permanent preference. Result: 3 different
"Prefere suite X" entries coexisting, all at 90% confidence, with
reservation patterns over time (2hrs, 4hrs, pernoite) all claiming to be
the customer's "preferred" duration.

Corrections:

1. ExtractionService prompt — preferencia now requires EXPLICIT
  declaration words ("prefiro", "gosto mais de", "sempre escolho",
  "adoro", "favorita"). A mere choice in one conversation is NO LONGER
  extracted as preferencia — instead it goes to padrao_comportamental
  WITH THE DATE in the content (e.g. "Reservou Alexa para pernoite em
  23/05/2026"). This makes memory temporal and auditable instead of
  imposing fake consistency.

2. Reference date is passed to the LLM prompt via the latest message
  timestamp, used as the anchor date the LLM must embed in every
  padrao_comportamental content.

3. ContradictionCheckerService — dual threshold:
  - cosine < 0.15 → auto-supersede without LLM (pure duplicate)
  - 0.15 to 0.6 → ask LLM if contradicts, supersede if yes
  - > 0.6 → ignore, unrelated facts
  Previously only the middle band existed, so near-duplicate facts like
  two "aniversário 23/05" entries or three "prefere suite X" entries
  were never cleaned up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 08:30:42 -03:00
Rodribm10
b742d774c8 fix(captain-memory): block suite hallucinations + hardcode cadastral data exclusion
Real test revealed gpt-4o-mini was still:
- Hallucinating suite names ("Aluba" doesn't exist — we only have
  Alexa, Stilo, Hidromassagem)
- Extracting cadastral data as memory ("Rodrigo has a CPF", "Name is X")
  despite the per-type NÃO examples

Added two sections at the top of the prompt:
1. Business canonical data — explicit whitelist of suite names (Alexa,
  Stilo, Hidromassagem) and stay types. Anything else = discard, NO auto-
  normalization. LLM must not guess.
2. Cadastral data absolute rule — explicit list of fields that are
  profile data, not memory: name, CPF/RG/passport, email/phone/address,
  birth date. Plus 5 concrete  examples of what was being wrongly
  extracted in the wild.

Existing 9 specs still pass (stub at call_llm; prompt change is
semantic, not structural).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 08:06:31 -03:00
Rodribm10
4becfd0a57 fix(captain-memory): strict taxonomy definitions in ExtractionService prompt
Real-world test revealed the LLM extractor (gpt-4o-mini) was using type
labels too loosely: a customer's QUESTION about parking ("tem
estacionamento?") was classified as 'reclamacao'. Similarly cortesia
generica ("obrigado") was becoming 'feedback_positivo', and transactional
events (CPF informed, reservation made) were becoming memories when they
should be ignored.

Rewrote build_prompt with:
- Per-type strict definition (what it IS)
- YES/NO examples for each of the 9 types, with the most common pitfalls
  explicitly shown as NO
- 7 absolute rules, including: questions are never complaints, generic
  courtesy is never feedback, agent actions are never customer memory,
  transactional events are not long-term facts
- Confidence threshold guidance (>=0.9 only if totally explicit, 0.7-0.89
  for strong inference, <0.7 discard)
- "If in doubt, discard — quality > quantity. Most transactional
  conversations should return empty facts list"

Existing 9 specs still pass (stub call_llm, so prompt changes don't
affect unit test assertions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 07:44:26 -03:00
Rodribm10
5874029a03 fix(captain-memory): raise RecallService timeout 0.5s -> 2.0s
Real-world observation: OpenAI embedding API takes 200-400ms typical,
plus pgvector query overhead, the 500ms budget was being exceeded
frequently, silently dropping memory recall. Agent typing delay is
already 2-15s humanized, so a 2s recall budget is well within UX
tolerance and gives ~4-5x margin over typical embedding latency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 07:25:19 -03:00
Rodribm10
8444209952 fix(captain-memory): always authorize index even when list is empty 2026-04-19 01:43:57 -03:00
Rodribm10
f7d4c41d07 feat(captain-memory): add MemoriesController with index/update/destroy/bulk_destroy 2026-04-19 01:41:09 -03:00
Rodribm10
638e84752d feat(captain-memory): add ContactMemoryPolicy (Pundit) 2026-04-19 01:37:13 -03:00
Rodribm10
1cf9531741 fix(captain-memory): use Agent#clone instead of ivar mutation + unify test path with runtime 2026-04-19 01:32:56 -03:00
Rodribm10
85324f594d feat(captain-memory): inject semantic memory into AgentRunnerService system prompt 2026-04-19 01:23:03 -03:00
Rodribm10
e89b96d09b feat(captain-memory): enqueue extraction on conversation.resolved 2026-04-19 01:13:26 -03:00
Rodribm10
2261b09b25 feat(captain-memory): add HardDeleteExpiredJob with daily cron (LGPD) 2026-04-19 01:09:28 -03:00
Rodribm10
b3077b2b26 feat(captain-memory): add AgingJob with TTL + LRU cap, weekly cron 2026-04-19 01:05:02 -03:00
Rodribm10
fb6673664a fix(captain-memory): isolate per-account failures in SilenceDetectorJob + fix typo 2026-04-19 01:01:28 -03:00
Rodribm10
833e76856e feat(captain-memory): add SilenceDetectorJob with 10min cron 2026-04-19 00:55:15 -03:00
Rodribm10
1646f66a97 fix(captain-memory): wrap ExtractFromConversationJob persistence in transaction + hoist unit lookup 2026-04-19 00:50:08 -03:00
Rodribm10
9d5e4c959f feat(captain-memory): add ExtractFromConversationJob with TTL + idempotency 2026-04-19 00:45:14 -03:00
Rodribm10
350a420ee0 feat(captain-memory): add ContradictionCheckerJob 2026-04-19 00:39:52 -03:00
Rodribm10
dc366433bb feat(captain-memory): add UpdateEmbeddingJob 2026-04-19 00:35:06 -03:00
Rodribm10
6723473fdc fix(captain-memory): ContradictionChecker exact-match parsing + rescue wrap + LLM failure test 2026-04-19 00:31:54 -03:00
Rodribm10
9bc6429b91 feat(captain-memory): add ContradictionCheckerService with LLM verification 2026-04-19 00:26:58 -03:00
Rodribm10
aec796ebfd fix(captain-memory): cap ExtractionService input, validate scope, filter failed msgs 2026-04-19 00:24:09 -03:00
Rodribm10
9d593757df feat(captain-memory): add ExtractionService with evidence+confidence guardrails 2026-04-19 00:18:32 -03:00
Rodribm10
0fee1b3c2f fix(captain-memory): strengthen RecallService logging context and document timeout tradeoff 2026-04-19 00:14:06 -03:00
Rodribm10
502c3d1698 feat(captain-memory): add RecallService with timeout and graceful degradation 2026-04-19 00:09:31 -03:00
Rodribm10
5d15f55a29 feat(captain-memory): add PromptInjectionService formatting memories as XML 2026-04-19 00:05:11 -03:00
Rodribm10
e1273f142b feat(captain-memory): add Captain::ContactMemory model with scopes and lifecycle methods 2026-04-18 23:53:33 -03:00
Rodribm10
6a5ba17bfc fix(captain): aceita DD/MM sem ano e amplia tratamento de requires_input no generate_pix
Problema observado: Daniela chamou generate_pix com arguments vazios apos
cliente informar "27/4". Tool retornou missing_fields=[check_in, amount] e
LLM caiu no fallback silenciosamente.

Correcoes:
- DDMMYYYY_REGEX agora aceita "DD/MM" sem ano (assume ano corrente, empurra
  pro proximo ano se a data ja passou)
- parse_date_without_year com fallback explicito
- Instruction da scenario Daniela_Reservas (DB, scenario_id=2) atualizada
  para listar todos os 4 parametros obrigatorios de generate_pix e
  distinguir requires_input (erro do LLM) de success=false (erro tecnico)

Backup da instruction anterior: /tmp/daniela_instruction_backup_20260418.txt

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 18:37:17 -03:00
Rodribm10
0b195781c5 feat(lifecycle): REST endpoint for lifecycle deliveries audit log 2026-04-15 10:29:24 -03:00
Rodribm10
8690a49971 feat(lifecycle): REST endpoint for lifecycle config singleton 2026-04-15 10:23:42 -03:00
Rodribm10
7c17a7cb96 feat(lifecycle): REST endpoint for lifecycle rules CRUD
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 10:17:59 -03:00
Rodribm10
cb67a1063d fix(lifecycle): move stub controllers from non-enterprise to enterprise path
Os stubs de lifecycle criados na task anterior estavam em app/controllers/
causando futura colisão de redefinição de classe quando os controllers reais
forem implementados em enterprise/app/controllers/ (tasks 4-6). Move os 3
stubs para o enterprise path onde vivem todos os controllers Captain.

Routing spec: 7 examples, 0 failures

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 10:13:33 -03:00
Rodribm10
7d21530bc7 feat(lifecycle): add Pundit policies for rule/config/delivery 2026-04-15 10:06:47 -03:00
Rodribm10
7b009cf47f feat(lifecycle): inject concierge context into Captain orchestrator prompt
Adds concierge.* and reservation.* Liquid variables to agent_instructions
so Sofia's orchestrator_prompt receives unit persona/knowledge/variables
and reservation data resolved from conversation.custom_attributes.current_unit_id.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 09:25:16 -03:00
Rodribm10
d0d08ed662 feat(lifecycle): implement DispatcherJob
Replace no-op stub with full perform body: find delivery by id, skip if
blank, delegate to Captain::Lifecycle::Dispatcher#call. Add retry_on
with polynomially_longer backoff (3 attempts). Spec covers dispatcher
delegation and graceful skip for missing records.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 09:20:32 -03:00
Rodribm10
0d4583a21a feat(lifecycle): add Dispatcher service with guards→render→send pipeline
Orchestrates guards → render (Liquid) → send pipeline for one delivery.
Handles skip, reschedule, sent, failed states and re-enqueues on reschedule.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 01:53:01 -03:00
Rodribm10
6d84a7586b feat(lifecycle): add MinInterval and CustomerReplied guards
Implement guards following the same pass/reschedule/too_stale pattern as QuietHours.
Also fix belongs_to :conversation on Delivery to use class_name: '::Conversation' to avoid namespace resolution failure inside Captain::Lifecycle module.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 01:49:22 -03:00
Rodribm10
fcdc2054b5 feat(lifecycle): add QuietHours guard with 2h staleness limit
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 01:44:39 -03:00
Rodribm10
823008a1cd feat(lifecycle): add Guards::Base e 3 guards simples (ReservationActive, OptOutLabel, MaxPerReservation)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 01:42:10 -03:00
Rodribm10
f6aa39921a feat(lifecycle): add ContextBuilder for Liquid render variables
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 01:39:35 -03:00
Rodribm10
8e0a06246b feat(lifecycle): wire Captain::Reservation lifecycle hooks
Add after_commit callbacks to call Captain::Lifecycle::Scheduler on
create, status change (cancelled/no_show), and check_in_at change.
Each handler wraps in rescue StandardError to preserve existing behavior.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 01:37:23 -03:00
Rodribm10
bb4631f427 feat(lifecycle): add Scheduler service and DispatcherJob stub
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 01:35:31 -03:00
Rodribm10
4a88f7f517 feat(lifecycle): add EventResolver service
Pure function mapping reservation events to timestamps; used by Scheduler (T9) to compute fire_at.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 01:31:47 -03:00
Rodribm10
a4472b80b9 feat(lifecycle): add concierge_* accessors to Captain::Unit 2026-04-15 01:23:40 -03:00
Rodribm10
41bbf14d57 feat(lifecycle): add Captain::Lifecycle::Delivery model with state helpers
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 01:21:11 -03:00
Rodribm10
ffc5ac7fb8 feat(lifecycle): add Captain::Lifecycle::Rule model with filter matching
TDD: 16 examples passing. Adds EVENTS constant, active/for_event scopes,
and matches_reservation? with unit_ids/categorias/permanencias filters.
Also adds captain_reservation factory used by the spec.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 01:18:17 -03:00
Rodribm10
6ee3fcd4ef feat(lifecycle): add Captain::Lifecycle::Config model 2026-04-15 01:14:19 -03:00
Rodribm10
ea8ff83034 feat: Captain::PixCharge posta nota interna quando PIX eh gerado
Antes so existiam 2 notas automaticas:
  1. 'Nova reserva criada' (from Captain::Reservation after_create_commit)
  2. 'Pagamento confirmado' (from Captain::Payments::ConfirmationService)

Adiciona uma terceira entre elas: 'PIX enviado, aguardando pagamento'
(from Captain::PixCharge after_create_commit). A atendente ve no
timeline: reserva -> pix enviado -> pix pago.
2026-04-14 20:09:20 -03:00