iachat/enterprise/app/models
Rodribm10 fa758e4848 feat(captain): hierarchical model routing + conversation-level memory cache
Two orthogonal cost optimizations to the Captain agent pipeline:

1. Hierarchical model routing (optimization A)

Captain::Scenario now overrides agent_model to read a dedicated
InstallationConfig CAPTAIN_OPEN_AI_MODEL_SCENARIO, falling back to the
global CAPTAIN_OPEN_AI_MODEL used by the orchestrator (Assistant).

Rationale: the orchestrator (Jasmine) does cheap triage (is this a
reservation intent? a greeting? escalate to human?) — a smaller model
handles this well. Scenarios (Daniela — reserva) run complex flows with
tool calling, strict taxonomies, and JSON schema output — they benefit
from a stronger model.

Config in this install: CAPTAIN_OPEN_AI_MODEL=gpt-4o-mini (orchestrator)
and CAPTAIN_OPEN_AI_MODEL_SCENARIO=gpt-4o (scenarios). Estimated ~60%
cost reduction vs everything on gpt-4o, preserving quality where it
matters for the business flow.

2. Conversation-level memory cache (optimization B)

MemoryPromptInjector now persists the computed memory block on
conversation.custom_attributes[captain_cached_memory_block]. First turn
computes once (embedding + pgvector query + XML formatting); subsequent
turns reuse. The customer's profile does not change during an open
conversation, so re-running the pipeline on every turn was pure waste.

Graceful fallbacks:
- Cache write failure → per-service-instance in-memory fallback still
  applies.
- Cache read failure → fresh recall runs (no regression).
- Contact mismatch → invalidates cache, fresh recall runs.

When a new conversation starts, custom_attributes is empty → fresh
recall populates the cache for that conversation's lifetime.

Estimated ~80% reduction in embedding + pgvector calls during
multi-turn conversations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 09:47:15 -03:00
..
captain feat(captain): hierarchical model routing + conversation-level memory cache 2026-04-19 09:47:15 -03:00
channel feat(enterprise): add voice conference API (#13064) 2025-12-15 15:11:59 -08:00
concerns feat(lifecycle): inject concierge context into Captain orchestrator prompt 2026-04-15 09:25:16 -03:00
enterprise feat: bypass user limit validation to allow unlimited agents 2026-02-25 21:40:18 -03:00
account_saml_settings.rb feat: update users on SAML setup and destroy [CW-2958][CW-5612] (#12346) 2025-09-15 21:20:22 +05:30
agent_capacity_policy.rb feat: Add agent capacity controllers (#12200) 2025-08-26 19:12:58 -07:00
applied_sla.rb Chore/merge upstream 4.8.0 (#150) 2025-11-19 16:25:58 -03:00
article_embedding.rb feat: legacy features to ruby llm (#12994) 2025-12-11 14:17:28 +05:30
captain_inbox.rb chore(style): fix rubocop offenses and update typing indicators 2026-02-25 15:06:58 -03:00
company.rb chore(style): fix rubocop offenses and update typing indicators 2026-02-25 15:06:58 -03:00
copilot_message.rb feat: Update UI for Copilot (#11561) 2025-06-02 22:02:03 -05:00
copilot_thread.rb feat: Add support for more tool, standardize copilot chat service (#11560) 2025-05-23 01:07:07 -07:00
custom_role.rb feat: Add APIs to manage custom roles in Chatwoot (#9995) 2024-08-23 17:18:28 +05:30
inbox_capacity_limit.rb feat: Add agent capacity controllers (#12200) 2025-08-26 19:12:58 -07:00
sla_event.rb feat: Conversation API to return applied_sla and sla_events (#9174) 2024-04-01 23:30:07 +05:30
sla_policy.rb fix: Prevent SLA deletion timeouts by moving to async job (#12944) 2025-12-10 12:28:47 +05:30