iachat

Author	SHA1	Message	Date
Codex CLI	c72543cc59	review: auto-review do Captain em 2026-05-10 Some checks failed Build and Push to GHCR (multi-arch) / build (linux/amd64, ubuntu-latest) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / build (linux/arm64, ubuntu-22.04-arm) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / merge (push) Has been cancelled Details	2026-05-10 03:06:57 +00:00
Codex CLI	aadfb4c080	review: auto-review do Captain em 2026-05-07 Some checks failed Build and Push to GHCR (multi-arch) / build (linux/amd64, ubuntu-latest) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / build (linux/arm64, ubuntu-22.04-arm) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / merge (push) Has been cancelled Details	2026-05-07 03:10:40 +00:00
Codex CLI	abf9f4057e	review: auto-review do Captain em 2026-05-01 Some checks failed Build and Push to GHCR (multi-arch) / build (linux/amd64, ubuntu-latest) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / build (linux/arm64, ubuntu-22.04-arm) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / merge (push) Has been cancelled Details	2026-05-01 03:08:36 +00:00
Codex CLI	7d03430113	review: auto-review do Captain em 2026-04-28 Some checks failed Build and Push to GHCR (multi-arch) / build (linux/amd64, ubuntu-latest) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / build (linux/arm64, ubuntu-22.04-arm) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / merge (push) Has been cancelled Details	2026-04-28 03:04:01 +00:00
Codex CLI	39bda94b93	review: auto-review do Captain em 2026-04-25 Some checks failed Build and Push to GHCR (multi-arch) / build (linux/amd64, ubuntu-latest) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / build (linux/arm64, ubuntu-22.04-arm) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / merge (push) Has been cancelled Details	2026-04-25 03:12:19 +00:00
Codex CLI	1adc79320a	feat(captain): aplica pernoite sem café = padrão − R$10 (todos os 4 hotéis) Some checks failed Build and Push to GHCR (multi-arch) / build (linux/amd64, ubuntu-latest) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / build (linux/arm64, ubuntu-22.04-arm) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / merge (push) Has been cancelled Details Aprovado pelo Rodrigo via comentário no Multica issue ad2ad5ae (2026-04-23T18:00). Mudanças aplicadas: - [pergunta 1] pernoite sem café custa R$10 a menos que pernoite c/ café Afeta: jasmine_primeal, jasmine_primevl, jasmine_qnn01, jasmine_express Co-Authored-By: Captain Reviewer <captain@hoteis1001noites.com.br>	2026-04-23 18:02:41 +00:00
Codex CLI	645ae4fec7	review: registra todas as rejeições de Rodrigo + resposta Pergunta 1 (pernoite sem café = -R$10)	2026-04-23 17:43:17 +00:00
Codex CLI	3d6e16f5f1	review: marca Padrão 1 e Padrão 2 como REJEITADOS por Rodrigo (2026-04-23)	2026-04-23 17:41:25 +00:00
Codex CLI	bf09e76eae	review: auto-review do Captain em 2026-04-23 (v2 — 7 padrões)	2026-04-23 17:32:52 +00:00
Codex CLI	6e7bcc9b44	review: auto-review do Captain em 2026-04-23	2026-04-23 17:01:14 +00:00
Rodribm10	c0b54c6783	feat(prompts): modelos de Qnn01, PrimeVL e Express (3 assistants + 15 scenarios) Some checks failed Build and Push to GHCR (multi-arch) / build (linux/amd64, ubuntu-latest) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / build (linux/arm64, ubuntu-22.04-arm) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / merge (push) Has been cancelled Details Gerados usando o modelo validado do PrimeAL como base, adaptando: - Nome do hotel, suítes, links (WhatsApp/Maps), saudação por unidade - Tabela de preços específica de cada unidade - Lista de outras unidades (exclui a própria, inclui as outras 8) - Observação de atendimento exclusivo por unidade Particularidades por unidade: - Qnn01: 4 suítes (Standard/Master/Pole Dance/Hidromassagem), tabela seg-qua + qui-dom, tem 12h - PrimeVL: 3 suítes (Stilo/Alexa/Hidromassagem), tabela seg-qua + qui-dom-feriado, tem 1h e hora excedente - Express: 2 suítes (Standard/Master), tabela seg-qua + qui-dom, redireciona pra Prime quando cliente pede hidro reclamacoes_ouvidoria.md é idêntico nas 4 unidades (framework LAST é universal). Testado em staging: aplicado nos 3 assistants respectivos, scenarios novos criados (outras_unidades + Reclamacoes_Ouvidoria), FAQs de blocos de prompt deletados, FAQs de preço duplicados removidos. Aguardando validação via WhatsApp real. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:18:13 -03:00
Rodribm10	86bee38474	chore(prompts): reorganiza pastas (_prod_snapshot→_producao_atual, _staging_current→_modelos) e prefixa arquivos por unidade - Renomeia _prod_snapshot → _producao_atual (refletindo melhor o papel: snapshot do que está rodando hoje em prod, só leitura) - Renomeia _staging_current → _modelos (modelos aperfeiçoados que vão virar nova prod) - Todos os arquivos em _modelos/ agora usam o prefixo jasmine_<slug>__ (ex: jasmine_primeal.md), seguindo a mesma convenção já usada em _producao_atual/ - Atualiza README com a nova convenção e checklist de validação por unidade Isso prepara a estrutura pra adicionar modelos das outras 3 unidades (Qnn01, PrimeVL, Express). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:17:33 -03:00
Rodribm10	0ecfce5c27	fix(captain): translate response_format to text.format on Codex proxy Sem isso o Codex devolvia texto puro e o reaction_emoji do JSON estruturado nunca chegava ao ResponseBuilderJob — quebrava a ferramenta de reagir mensagens com emoji. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 22:47:09 -03:00
Rodribm10	9e8550dd45	feat(captain): CAPTAIN_CODEX_MODEL_OVERRIDE pra usar modelos fora do catalog do RubyLLM Adiciona sobrescrita de modelo no proxy. Motivação: o RubyLLM valida o modelo contra um catalog interno antes de enviar a call. Modelos novos (gpt-5.4, gpt-5.3-codex) ainda não estão nesse catalog e geram RubyLLM::ModelNotFoundError. Com CAPTAIN_CODEX_MODEL_OVERRIDE definida, o Translator substitui o modelo do body antes de enviar ao Codex. Captain continua passando um modelo reconhecido (gpt-5.2), mas o Codex recebe o modelo real (gpt-5.4). Exemplo: InstallationConfig.find_or_initialize_by(name: "CAPTAIN_CODEX_MODEL_OVERRIDE") .update!(value: "gpt-5.4", locked: false) Validado: curl → proxy → Codex retorna "model":"gpt-5.4" no response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 17:55:22 -03:00
Rodribm10	b457e84c2f	fix(captain): route embeddings to legacy OpenAI + retry transient errors Resolve duas camadas de problema identificadas em teste end-to-end: 1. Embeddings falhavam com HTTP 404 (/codex/v1/embeddings não existe). Solução: Captain::Llm::EmbeddingService sempre usa OpenAI tradicional via Llm::Config.with_api_key(legacy_settings). ProviderConfig expõe legacy_openai_settings pra isso. 2. Servidor Codex ocasionalmente responde com response.failed + code=server_error (instabilidade transitória). Client agora retenta até 2x com backoff exponencial (0.5s, 1.5s) em erros retryable: HTTP 5xx, server_error no response.failed, ou stream inacabado. Outras correções nesta etapa: - Scenario#agent_model: em modo Codex, ignora CAPTAIN_OPEN_AI_MODEL_SCENARIO (que pode ter gpt-4o legado) e usa ProviderConfig.model. - ExtractionService/ContradictionCheckerService/TranslateQueryService: trocam constantes hardcoded gpt-4o-mini/gpt-4.1-nano por ProviderConfig.light_model (respeitando o provider ativo). - ProviderConfig.DEFAULT_CODEX_MODEL agora é gpt-5.2 (reconhecido pelo RubyLLM; gpt-5.4 não está no catalog do gem). Validado ponta-a-ponta: WhatsApp → Chatwoot → Jasmine → handoff Daniela → faq_lookup com embedding OK → resposta com preços corretos. Docs em docs/captain-codex-oauth.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 17:42:31 -03:00
Rodribm10	26290c34a7	feat(captain): feature flag CAPTAIN_LLM_PROVIDER + ProviderConfig central Adiciona o toggle openai_api \| openai_codex_oauth. Por padrão mantém comportamento legado (API key OpenAI tradicional). Quando mudamos pra openai_codex_oauth, os clientes (RubyLLM + Agents gem) passam a apontar para o proxy interno em http://localhost:3000/codex, configurável via CAPTAIN_CODEX_PROXY_URL. - Captain::Llm::ProviderConfig: single source of truth de api_key, api_base e model, baseado em CAPTAIN_LLM_PROVIDER - config/initializers/ai_agents.rb refatorado - lib/llm/config.rb refatorado - 8 specs do ProviderConfig passando - Fallback seguro: api_key dummy ('codex-oauth') quando usando proxy (o proxy ignora Authorization e usa OAuth interno) NÃO mexe no Llm::LegacyBaseOpenAiService (PDF/Files API). Esse continua sempre na API tradicional porque o endpoint Codex não expõe Files API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 15:29:52 -03:00
Rodribm10	d53c86df94	fix(captain): always include instructions in Codex responses body Codex endpoint retorna HTTP 400 "Instructions are required" quando o campo vem ausente. Agora sempre incluímos o campo — string com espaço quando não há system message no request. Validado end-to-end: curl → /codex/v1/chat/completions → proxy traduz → Codex devolve streaming SSE → proxy agrega → JSON Chat Completions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 15:27:37 -03:00
Rodribm10	928b1ec6b9	feat(captain): Codex OAuth auth module + proxy controller Implementa Fases 1+2 do plano Captain Codex OAuth. Fase 1 — Auth módulo: - Migration captain_codex_credentials (tokens AR-encrypted) - Model Captain::CodexCredential (singleton-ish com .current) - Captain::Codex::AuthService com device flow completo: start_device_login, poll_once, exchange_for_credential, valid_access_token (auto-refresh), refresh! - Rake task captain:codex:{login,status,refresh} - Sidekiq job Captain::Codex::RefreshTokensJob rodando a cada 30min Fase 2 — Proxy Chat Completions → Responses: - Captain::Codex::Translator (chat ↔ responses, tools, tool_calls) - Captain::Codex::Client (streaming SSE → agregado) - Api::Internal::CodexProxyController expondo POST /codex/v1/chat/completions - 10 specs do Translator passando Próximo: Fase 3 (feature flag + fallback) e reconfiguração dos clientes RubyLLM/Agents/ruby-openai pra apontarem pro proxy quando CAPTAIN_LLM_PROVIDER=openai_codex_oauth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 15:07:01 -03:00
Rodribm10	df56ee8115	chore(captain): PoC Codex OAuth device flow + Responses streaming PoC validado com conta ChatGPT Plus e client_id do Hermes. Device flow OAuth funciona, gera access_token + refresh_token auto-refresh. Chat e function calling funcionaram em gpt-5.4, gpt-5.4-mini, gpt-5.2 e gpt-5.3-codex. Descobertas pro adapter final: - Endpoint: /responses (não /chat/completions) - Streaming obrigatório (stream: true) - store: false obrigatório - Sem temperature/top_p (modelos reasoning) - input[] no lugar de messages[] - instructions top-level no lugar de system role - Tools sem wrapping function: {} - Output via events response.output_item.done (não response.completed) Pasta scripts/captain_codex_poc/ está excluída do Rubocop (scripts standalone, não rodam em contexto Rails). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 14:56:57 -03:00
Rodribm10	c512e3e5f6	chore(prompts): split prod snapshot from staging from target Some checks failed Build and Push to GHCR (multi-arch) / build (linux/amd64, ubuntu-latest) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / build (linux/arm64, ubuntu-22.04-arm) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / merge (push) Has been cancelled Details Reorganized db/seed_prompts/ into three clear bins: _prod_snapshot/ — 16 prompts pulled from iachat_production (4 Jasmines + 12 scenarios). Read-only baseline. _staging_current/ — 6 prompts active in iachat-v2 right now (Jasmine + 5 scenarios, including outras_unidades and Reclamacoes_Ouvidoria which were created on this branch). target/ — empty for now. Source of truth: the seed migration only writes from here. Files we review and approve land here, then deploy pushes them to prod. Updated the seed migration to walk target/ and to support both generic scenarios (apply to every unit) and unit-scoped scenarios (file prefixed with assistant slug, only that unit). Empty files are skipped — useful for staged rollouts. This guarantees no prompt ships to prod by accident: only what ends up in target/ is applied.	2026-04-22 11:31:42 -03:00
Rodribm10	d0a2688dd2	chore(prompts): snapshot 16 production prompts + dynamic seed migration	2026-04-22 11:24:41 -03:00
Rodribm10	95d3e99652	feat(retention): version the Jasmine + Daniela prompts as seed files The orchestrator prompt (Jasmine) and scenario instruction (Daniela) live in the database. When we merge this branch to main and deploy to production, the prod DB will keep its OLD prompts — the new ones would only exist in staging. That defeats the point of merging. Fix: commit the current staging prompts as .md files under db/seed_prompts/ and add a data migration that syncs them into the DB on deploy. Idempotent (no-ops when content already matches). From now on, prompt changes follow the same workflow as code: edit the .md file, migration resyncs on deploy. The DB row becomes a mirror of the file, not the source of truth.	2026-04-22 11:00:06 -03:00
Rodribm10	6fa2f621fa	feat(retention): UI layer — badge, filters, cohort matrix, KPI dashboard - RetentionSummaryBadge in the "Previous conversations" sidebar: tiered status (First contact / Active / Recurring / Sleeping / At risk / Inactive) + counts of interactions, one-shots, Pix. - Retention tab in Captain Reports: KpiCards, FlowCard, CohortMatrix (12x13 heatmap with CSV export). - Five new filters on the contacts list: recurring, last interaction, days since, interactions count, reservations paid. - Full pt_BR + en i18n under CAPTAIN_REPORTS.RETENTION.* - Spec for InteractionCalculatorService covering gap behavior, one-shot classification, internal-label exclusion, multi-conversation grouping across the 30h window. - Docs: docs/captain-retention-indicators.md with business rules, column reference, endpoint shape, and backup SQL queries.	2026-04-22 10:30:19 -03:00
Rodribm10	aed6d62640	feat(retention): summary KPIs + cohort endpoints Exposes two JSON endpoints under /api/v1/accounts/:id/captain/reports: - GET /retention — aggregate KPIs (active/recurring/sleeping/at-risk/ churned, new vs returned in period, Pix generated/paid/conversion, retention rates at 30d and 90d) - GET /retention/cohort — monthly cohort matrix, 12 months lookback, 12 months of offset. Each cell is % of the cohort that interacted in month M+N. SQL-aggregated with DATE_TRUNC + DISTINCT so it is a single query even on large histories.	2026-04-22 09:59:21 -03:00
Rodribm10	f6488ce2de	feat(retention): foundation for customer retention metrics Lays the data + job foundation for tracking customer interactions, recurrence, and Pix conversion on Contact. Design decisions negotiated with Rodrigo (see docs to come): Rules: - Gap of 30h from last message defines separate interactions - Qualified interaction = >=2 customer msgs + >=2 attendant msgs, both with textual content (>= 2 letters) - One-shot consultation = >=1+1 but below the qualified threshold (tracked as secondary KPI) - Excludes contacts labeled `equipe_interna` - is_recurring = interactions_count >= 2 - pix_generated_count counts all PixCharges; reservations_paid_count only counts those with status = paid Surface area: - Migration adds denormalized stats to contacts + indexes for fast filtering - Captain::ContactStats::InteractionCalculatorService computes the stats for a single contact (pure, no persistence) - Captain::Retention::RecalculateContactStatsJob persists them for one contact (idempotent) - Captain::Retention::RecalculateAllContactStatsJob runs daily at 3am BRT, enqueues per-contact jobs for everyone active in the last 120 days - Event-driven refresh: CaptainListener#conversation_resolved enqueues recalc; Captain::PixCharge after_create/after_update enqueues recalc on status change No UI yet — that's the next layer.	2026-04-22 09:50:23 -03:00
Rodribm10	08a06c6528	fix(captain): memory allows 'Solicitou Pix ..., aguardando pagamento' Some checks failed Build and Push to GHCR (multi-arch) / build (linux/amd64, ubuntu-latest) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / build (linux/arm64, ubuntu-22.04-arm) (push) Has been cancelled Details Build and Push to GHCR (multi-arch) / merge (push) Has been cancelled Details Previous commit made the extractor reject any reservation-shaped fact without a literal payment confirmation in the conversation. That killed the useful middle ground: a customer who requests a Pix and hasn't paid yet is still a concrete signal worth remembering (for follow-up, interest mapping, CRM). We were going from "hallucinated reservation" to "nothing remembered". Add the intermediate pattern: - Payment confirmed → "Reservou X para Y em DD/MM/AAAA" - Pix generated, no payment yet → "Solicitou Pix para X em DD/MM/AAAA, aguardando pagamento" - Just a price quote → nothing The "aguardando pagamento" suffix is required so the downstream recall never confuses it with a closed reservation.	2026-04-22 05:01:24 -03:00
Rodribm10	d2c2c6b7fe	fix(captain): pre-reservation semantics + no duplicate pix links Three UX bugs from staging testing: 1. Duplicate Pix link in WhatsApp — the tool's formatted_message embedded the full link + instructions, so the LLM copied it into its own response on top of the dedicated link message sent by dispatch_direct_link_message. The tool now returns a short summary with no URL; dispatch is the single source of the link. 2. "Reserva confirmada!" sent before payment — the scenario prompt used the word "confirmação" loosely, which the LLM read as the reservation being closed. Now the prompt forces "pré-reserva / aguardando pagamento" until the Pix is actually paid, and the dispatched link message explains that the reservation is only secured after payment clears. 3. Memory extraction wrote "Reservou Hidromassagem para pernoite em 22/04/2026" when the customer only received a Pix link and replied "obrigado". Tightened the extraction prompt so padrao_comportamental of a reservation requires a literal payment confirmation — Pix generated alone no longer qualifies.	2026-04-22 04:19:39 -03:00
Rodribm10	6c9d12559d	fix(captain): generate_pix returns success=false on real errors When Inter integration fails ("Unit not configured for Pix", missing certs, etc.), the tool was returning success=true with the error message as formatted_message. The LLM interpreted that as success and hallucinated "Pix generated" to the customer — and never triggered the generate_reservation_link fallback. Switch the rescue path from tool_feedback_response (success=true) to error_response (success=false) so the Daniela scenario correctly falls back to the reservation-link tool as documented in her prompt.	2026-04-21 18:59:45 -03:00
Rodribm10	ee2aae3958	fix(captain): generate_pix asks nome+CPF together, hydrates bare name Root cause of the staging test failure: - Tool asked for CPF then name separately, two back-and-forth turns. - When the user replied with just "Rodrigo Borba Machado" (no "nome:" prefix), NAME_WITH_LABEL_REGEX didn't match, so the contact.name stayed as the emoji "😅‼️". The tool kept returning missing_name and the LLM eventually hallucinated success without another generate_pix call. Changes: - missing_identity_response combines nome + CPF into one prompt when both are missing. - extract_name_from_qa_pattern finds the last outgoing message asking for "nome completo" and takes the next incoming message as the name candidate. - extract_name_run_from_text pulls the leading alphabetic run from the message so "Rodrigo Borba Machado, 00251938131" parses the name correctly alongside the CPF.	2026-04-21 18:35:44 -03:00
Rodribm10	cfffea9c16	feat(captain): semantic memory fixes + roleta + reclamações + analytics Consolida o trabalho desta branch de abril/2026 em um bloco pronto pra testar em staging antes do merge pra main. ## Correções de memória semântica - ExtractionService: Princípio Zero + Regra de Ouro (ação consumada vs intenção). - Cenário Daniela_Reservas: Passo 0 de classificação (consulta/intenção/fora). ## Roleta da Sorte (end-to-end) - Schema Supabase + 7 RPCs atômicas (server-side, idempotentes). - Services: Offer, Redeem, WeeklyReport. - Jobs: OfferRouletteJob (hook em ConfirmationService após Pix pago), NotifyRevealed + Scheduler de fallback. - Tool manual GenerateRoletaLinkTool + endpoint público /roleta/notify. - Dashboard /captain/roleta com Resgate + Relatório + anomaly detection. ## Cenário Reclamacoes_Ouvidoria - Triagem P1-P4, framework LAST, Three-level listening, Self-check. - Sem compensação material, detecção de cliente frustrado eleva prioridade. ## Analytics - Funil de conversão /captain/funnel: 5 etapas via regex, zero LLM. - Detector de churn via ChurnOutreach* (cron dias úteis 10h-17h BRT). ## Trabalho pré-existente incluído - Captain Executive Reports (ceo_digest, mattermost_delivery). - get_reserva_preco_tool, Lifecycle ajustes, Reservations UI polimentos. ## Outros - .gitignore: patterns pra credenciais. - Migrations de scenarios idempotentes. - i18n completa pt_BR+en pra roleta/funnel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 15:36:25 -03:00
Rodribm10	978ccbbdfb	fix(captain): wrap runner.run in Timeout to guard HTTP hangs Observed incident 2026-04-19 14:34: ResponseBuilderJob sat 156s 'Performing' in Sidekiq without ever emitting [Captain V2] Agent result, while the client waited on WhatsApp. The runner.run() call never returned — presumably an HTTP hang on the LLM side (OpenAI slow, network flake, or retry storm inside ruby-llm). Post-hoc protections (tool_loop_detected, max_turns) can't fire because they only inspect result after run() returns. Adding a 45s hard timeout on the run() block guarantees we bail out, trigger bot_handoff, and respond to the client instead of hanging forever. Rescue Timeout::Error separately so the log message is specific and the user-facing message says "demorou mais do que o esperado". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 11:40:59 -03:00
Rodribm10	aa7da915e3	fix(captain): remove scenario->orchestrator back-handoff (ping-pong) Problema observado em teste real 2026-04-19 11:24: usuário forneceu suíte+data+hora pra Daniela. Em vez de chamar generate_pix, Daniela chamou handoff_to_jasmine. Jasmine respondeu "Vou te transferir pra Daniela..." — mentira, a conversa ficou parada com a Jasmine. Sequência dentro de UM único run: jasmine.handoff_to_daniela_reservas_agent -> daniela.handoff_to_jasmine (!) -> jasmine responde "vou te transferir..." O prompt da Daniela tem "🚨 NUNCA FAÇA HANDOFF DE VOLTA PRA JASMINE" mas o LLM ignora a proibição quando a ferramenta está registrada. A única solução robusta é não registrar a ferramenta. Historicamente tivemos medo de remover a back-edge porque sem ela a Daniela (quando confusa) ficava em loop chamando faq_lookup — incidente que queimou créditos reais. Esse medo não vale mais: commit `f3f8a8d5c` adicionou TOOL_LOOP_THRESHOLD=3 + MAX_TURNS_PER_MESSAGE=15 que disparam bot_handoff automático em qualquer loop de tool. A proteção contra runaway existe por OUTRA via agora, então podemos remover a back-edge com segurança. Efeito esperado: - scenario termina a resposta sozinho (sem ping-pong) - scenario confuso/em loop -> rate limit corta -> humano recebe Memory: atualizado feedback_never_touch_captain_without_safety_caps.md refletindo a nova invariante. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 11:30:19 -03:00
Rodribm10	f3f8a8d5c1	feat(captain): rate limiting with runaway loop detection + bot_handoff Três camadas de proteção contra runaway token burn no AgentRunnerService: 1. MAX_TURNS_PER_MESSAGE = 15 Cap dentro de uma única chamada run(). Já estava aplicado; agora extraído como constante nomeada. 2. MAX_TURNS_PER_CONVERSATION = 30 Cap ao longo da vida da conversa. Contador em conversation.custom_attributes['captain_turn_count']. Ao atingir, dispara bot_handoff automático e responde com mensagem de transferência pra humano. 3. TOOL_LOOP_THRESHOLD = 3 Detecta a mesma (tool_name, args) invocada 3+ vezes no resultado de um único run (sintoma do loop faq_lookup que queimou tokens em 2026-04-19). Ao detectar: dispara bot_handoff e aborta o turno. trigger_bot_handoff! aciona conversation.bot_handoff! quando disponível, removendo a conversa do pipeline automático. Motivação: dois incidentes reais de queima de crédito OpenAI em 2026-04-19. Ver memory/feedback_never_touch_captain_without_safety_caps.md pras invariantes completas. Tests atualizados: mock_result agora stuba :messages (usado pelo novo tool_loop_detected?) e max_turns esperado é 15. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 11:16:54 -03:00
Rodribm10	7bc5103541	fix(captain): cap max_turns at 15 + restore scenario->orchestrator handoff Runaway incident: Daniela (reservation scenario) entered a tool-calling loop, invoking faq_lookup with the same query dozens of times per second, stuck at 'Performing' in Sidekiq for minutes with 1-of-12 busy. Root cause was two interacting factors: 1. The previous commit removed scenario_agent.register_handoffs( assistant_agent) to prevent ping-pong. In practice, the scenario LLM uses handoff_to_orchestrator as a safety valve when it cannot advance. Without it, the LLM kept calling other available tools (faq_lookup) indefinitely. 2. max_turns was 100. A runaway loop could burn 100 LLM + tool cycles before Sidekiq's timeout fired, which meant real token spend in a single bad turn could blow a day's budget. Both restored/fixed: - max_turns: 100 -> 15. Plenty for normal flows; hard ceiling on any runaway. The LLM simply ran out of turns and had to emit a final response instead of looping further. - scenario -> orchestrator handoff: re-registered. Ping-pong risk is contained by max_turns AND by explicit prompt rules in the scenario instruction forbidding gratuitous handoffs (added to Daniela prompt in earlier commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 11:03:22 -03:00
Rodribm10	e5d186c689	fix(captain): stop scenario->orchestrator handoff + narrow FAQ guardrail Two behavioural regressions caught in live testing with a real customer conversation: 1. Ping-pong scenario -> orchestrator -> scenario build_and_wire_agents was calling scenario_agents.register_handoffs( assistant_agent), which exposed handoff_to_jasmine as a tool INSIDE every scenario. Daniela (reservation scenario) kept calling it mid flow, the orchestrator resumed the turn, and customers got messages like "Vou te encaminhar para a Daniela..." after ALREADY being with Daniela. The back-edge is removed. When a customer legitimately changes topic mid-scenario, pick_starting_agent on the next turn already routes back to the orchestrator based on conversation state, so no manual handoff from the scenario side is needed. 2. FAQ_PRICE_PATTERNS was hijacking legitimate routing responses The previous regex matched the bare words "pernoite", "sinal", "diaria" WITHOUT requiring a numeric price nearby. A legitimate handoff response like "Vou transferir para a Daniela para confirmar a Stilo para pernoite" tripped the guardrail, which then substituted the response with raw FAQ content about rates. Narrowed to: R$ values, numbers followed by "reais", and the explicit price-noun variants (preco/preço/valor/preços/valores/custo/custa). Incidental mentions of stay types no longer trigger. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 10:51:45 -03:00
Rodribm10	fa758e4848	feat(captain): hierarchical model routing + conversation-level memory cache Two orthogonal cost optimizations to the Captain agent pipeline: 1. Hierarchical model routing (optimization A) Captain::Scenario now overrides agent_model to read a dedicated InstallationConfig CAPTAIN_OPEN_AI_MODEL_SCENARIO, falling back to the global CAPTAIN_OPEN_AI_MODEL used by the orchestrator (Assistant). Rationale: the orchestrator (Jasmine) does cheap triage (is this a reservation intent? a greeting? escalate to human?) — a smaller model handles this well. Scenarios (Daniela — reserva) run complex flows with tool calling, strict taxonomies, and JSON schema output — they benefit from a stronger model. Config in this install: CAPTAIN_OPEN_AI_MODEL=gpt-4o-mini (orchestrator) and CAPTAIN_OPEN_AI_MODEL_SCENARIO=gpt-4o (scenarios). Estimated ~60% cost reduction vs everything on gpt-4o, preserving quality where it matters for the business flow. 2. Conversation-level memory cache (optimization B) MemoryPromptInjector now persists the computed memory block on conversation.custom_attributes[captain_cached_memory_block]. First turn computes once (embedding + pgvector query + XML formatting); subsequent turns reuse. The customer's profile does not change during an open conversation, so re-running the pipeline on every turn was pure waste. Graceful fallbacks: - Cache write failure → per-service-instance in-memory fallback still applies. - Cache read failure → fresh recall runs (no regression). - Contact mismatch → invalidates cache, fresh recall runs. When a new conversation starts, custom_attributes is empty → fresh recall populates the cache for that conversation's lifetime. Estimated ~80% reduction in embedding + pgvector calls during multi-turn conversations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 09:47:15 -03:00
Rodribm10	bcf41ad15f	fix(captain-memory): guard memory recall from blocking agent worker Real-world test triggered a Sidekiq worker hang on conv 67 after a message was routed through Daniela: two ResponseBuilderJobs (msg 1318 and 1319) started, emitted typing_on, then never returned. Sidekiq showed 2/12 workers stuck for 10+ minutes — indefinite. Root cause likely: Agents::Runner evaluates the orchestrator instructions lambda multiple times per turn, and our wrapped lambda calls MemoryPromptInjector#append_memory_block each time. Inside, RecallService invokes OpenAI embedding API (2s timeout) and pgvector. Ruby's Timeout.timeout has documented holes on net/http syscalls — if the embedding API stalls at the socket level, the worker hangs forever even though the timeout "fired". Two fixes: 1. Per-message cache in the injector instance: the same message_text is embedded + queried once, not N times per turn. Dramatic reduction in network calls + DB queries during a single agent run. Every call after the first returns the cached block instantly. 2. Absolute rescue at append_memory_block top level: rescue StandardError => e; return base_prompt. Even if the whole memory pipeline throws, the base system prompt passes through and the agent keeps responding. Memory is NEVER allowed to block a response — that was already the design intent but the lambda caller path didn't honor it rigorously enough. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 09:06:35 -03:00
Rodribm10	6330bec857	fix(captain-memory): temporal memory model + aggressive dedup User feedback revealed a fundamental design issue: the memory model was accumulating contradictory "Prefere X" facts because a single choice was being treated as a permanent preference. Result: 3 different "Prefere suite X" entries coexisting, all at 90% confidence, with reservation patterns over time (2hrs, 4hrs, pernoite) all claiming to be the customer's "preferred" duration. Corrections: 1. ExtractionService prompt — preferencia now requires EXPLICIT declaration words ("prefiro", "gosto mais de", "sempre escolho", "adoro", "favorita"). A mere choice in one conversation is NO LONGER extracted as preferencia — instead it goes to padrao_comportamental WITH THE DATE in the content (e.g. "Reservou Alexa para pernoite em 23/05/2026"). This makes memory temporal and auditable instead of imposing fake consistency. 2. Reference date is passed to the LLM prompt via the latest message timestamp, used as the anchor date the LLM must embed in every padrao_comportamental content. 3. ContradictionCheckerService — dual threshold: - cosine < 0.15 → auto-supersede without LLM (pure duplicate) - 0.15 to 0.6 → ask LLM if contradicts, supersede if yes - > 0.6 → ignore, unrelated facts Previously only the middle band existed, so near-duplicate facts like two "aniversário 23/05" entries or three "prefere suite X" entries were never cleaned up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 08:30:42 -03:00
Rodribm10	b742d774c8	fix(captain-memory): block suite hallucinations + hardcode cadastral data exclusion Real test revealed gpt-4o-mini was still: - Hallucinating suite names ("Aluba" doesn't exist — we only have Alexa, Stilo, Hidromassagem) - Extracting cadastral data as memory ("Rodrigo has a CPF", "Name is X") despite the per-type NÃO examples Added two sections at the top of the prompt: 1. Business canonical data — explicit whitelist of suite names (Alexa, Stilo, Hidromassagem) and stay types. Anything else = discard, NO auto- normalization. LLM must not guess. 2. Cadastral data absolute rule — explicit list of fields that are profile data, not memory: name, CPF/RG/passport, email/phone/address, birth date. Plus 5 concrete ❌ examples of what was being wrongly extracted in the wild. Existing 9 specs still pass (stub at call_llm; prompt change is semantic, not structural). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 08:06:31 -03:00
Rodribm10	4becfd0a57	fix(captain-memory): strict taxonomy definitions in ExtractionService prompt Real-world test revealed the LLM extractor (gpt-4o-mini) was using type labels too loosely: a customer's QUESTION about parking ("tem estacionamento?") was classified as 'reclamacao'. Similarly cortesia generica ("obrigado") was becoming 'feedback_positivo', and transactional events (CPF informed, reservation made) were becoming memories when they should be ignored. Rewrote build_prompt with: - Per-type strict definition (what it IS) - YES/NO examples for each of the 9 types, with the most common pitfalls explicitly shown as NO - 7 absolute rules, including: questions are never complaints, generic courtesy is never feedback, agent actions are never customer memory, transactional events are not long-term facts - Confidence threshold guidance (>=0.9 only if totally explicit, 0.7-0.89 for strong inference, <0.7 discard) - "If in doubt, discard — quality > quantity. Most transactional conversations should return empty facts list" Existing 9 specs still pass (stub call_llm, so prompt changes don't affect unit test assertions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 07:44:26 -03:00
Rodribm10	6ecafd30c6	feat(captain-memory): redesign Contact Memories UI with type badges + relative time + fix i18n keys	2026-04-19 07:38:50 -03:00
Rodribm10	b07486c430	feat(captain-memory): wire Contact Memories section into conversation sidebar	2026-04-19 07:30:30 -03:00
Rodribm10	5874029a03	fix(captain-memory): raise RecallService timeout 0.5s -> 2.0s Real-world observation: OpenAI embedding API takes 200-400ms typical, plus pgvector query overhead, the 500ms budget was being exceeded frequently, silently dropping memory recall. Agent typing delay is already 2-15s humanized, so a 2s recall budget is well within UX tolerance and gives ~4-5x margin over typical embedding latency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 07:25:19 -03:00
Rodribm10	1ce07cc78c	docs(captain-memory): add operator guide for enabling Contact Memory flags (UI toggles deferred) Documents the Rails console procedure to toggle captain_contact_memory_extraction_enabled and captain_contact_memory_recall_enabled on Account#custom_attributes, including rollout phasing (extraction-first, then recall), rollback, bulk enablement, and post-activation verification queries. The UI toggles in Captain Settings are deferred: the existing FeatureToggle component is coupled to the captain_features hash and cannot be reused for custom_attributes-backed flags without a new component and a new account-update store action. Scope and implementation notes for that follow-up are included at the end of the document. Task 5.4 of Captain Semantic Memory epic (Phase 5).	2026-04-19 01:52:13 -03:00
Rodribm10	2f7d8edd92	feat(captain-memory): add Contact Memory UI component + API client + i18n Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 01:47:56 -03:00
Rodribm10	8444209952	fix(captain-memory): always authorize index even when list is empty	2026-04-19 01:43:57 -03:00
Rodribm10	f7d4c41d07	feat(captain-memory): add MemoriesController with index/update/destroy/bulk_destroy	2026-04-19 01:41:09 -03:00
Rodribm10	638e84752d	feat(captain-memory): add ContactMemoryPolicy (Pundit)	2026-04-19 01:37:13 -03:00
Rodribm10	9c035722de	test(captain-memory): end-to-end learning and recall integration test	2026-04-19 01:35:09 -03:00
Rodribm10	1cf9531741	fix(captain-memory): use Agent#clone instead of ivar mutation + unify test path with runtime	2026-04-19 01:32:56 -03:00

1 2 3 4 5 ...

6264 Commits