Phase 3 — Goals
What: The complete Postgres schema for the memory system. Every subsequent Phase 3 milestone depends on this schema. Getting it right on the first pass — correct types, indexes, RLS, constraints — prevents expensive migrations later.
Phase 3 — Goals
Goal 3.1: Memory Schema and Data Foundation
What: The complete Postgres schema for the memory system. Every subsequent Phase 3 milestone depends on this schema. Getting it right on the first pass — correct types, indexes, RLS, constraints — prevents expensive migrations later.
Acceptance criteria:
ibex_core.memoriestable with pgvectorVECTOR(384)column, RLS enabledibex_core.memory_relationshipsgraph table (typed edges between memory nodes)ibex_core.memory_versionsimmutable history (append-only, never updated)ibex_core.memory_tagsnormalised tagging table- IVFFlat vector index (lists=100) for cosine similarity search
- GIN full-text search index on content
- All tables have org_id RLS policies
- Base Python SQLAlchemy models (async) generated from schema
- Migration runner works:
make db-migrateapplies cleanly from zero
Milestones: 3.1.1 → 3.1.2
Goal 3.2: Embedding Service
What: A dedicated microservice that loads all-MiniLM-L6-v2 at startup and exposes a batched embedding API. It is the only component that calls the sentence-transformers library. All other services call it via HTTP.
Acceptance criteria:
- Model loaded and warmed up before the service accepts traffic (liveness vs readiness distinction)
- Batch endpoint: accept up to 512 texts, return 384-dim vectors for each
- Single-text endpoint: accept one text, return 384-dim vector within 20ms (CPU)
- Redis content-hash cache: if SHA-256(text) is cached, return stored vector without inference
- Cache hit rate > 80% under realistic production load (memory content repeats in extraction)
- Service is stateless and horizontally scalable
Milestones: 3.2.1 → 3.2.2 → 3.2.3 → 3.2.4
Goal 3.3: Memory Service
What: The Python microservice that owns all memory data. Writes go through a 9-step pipeline (validation → PII → dedup → embedding → near-duplicate check → conflict trigger → DB → hot cache → index). Reads go through semantic search + ranking.
Acceptance criteria:
- Memory write pipeline executes all 9 steps in order
- PII detection redacts email, phone, SSN patterns before storage
- Content dedup: identical content_hash = no insert, return existing memory
- Near-duplicate (cosine similarity > 0.92): trigger merge/supersession workflow
- Semantic search returns top-K memories ranked by composite score (formula from ARCHITECTURE.md)
- Hot cache (Redis sorted set per agent): top-50 memories by composite score
- Write p95 < 200ms, search p95 < 100ms
- All data scoped by org_id; RLS + application-layer double enforcement
Milestones: 3.3.1 → 3.3.2 → 3.3.3 → 3.3.4 → 3.3.5 → 3.3.6
Goal 3.4: Memory Extraction Worker
What: The Celery background worker that reads completed session checkpoints and extracts structured memories using a secondary LLM call. This is the "learning" part of the system.
Acceptance criteria:
- Worker processes sessions incrementally: reads turns >
last_extracted_turn, updates pointer after extraction - Extraction prompt produces structured JSON: category, content, confidence
- Idempotent: processing the same checkpoint twice produces no duplicates
- Memory embedding and DB write happen in the same task (not split)
- Conflict detection triggered when a new memory has similarity > 0.85 to an existing memory of the same category
- Failed tasks retry 3 times with exponential backoff before dead-letter
- Worker processes a 10-turn session within 10 seconds under normal load
- Session
status = 'completed'is the trigger;status = 'active'is NOT processed
Milestones: 3.4.1 → 3.4.2 → 3.4.3 → 3.4.4 → 3.4.5 → 3.4.6
Goal 3.5: Context Assembly Engine
What: The Python gRPC service that assembles the enriched context for every LLM request. Called by the proxy in the hot path with a 40ms deadline on the retrieval operations.
Acceptance criteria:
- gRPC server starts, registers with proto, handles
AssembleContextRPC - Token budget calculated correctly per model (GPT-4o = 128K, GPT-4o-mini = 128K, etc.)
- Parallel retrieval: directive + hot memories + cold semantic search run concurrently, all within 40ms
- Composite scoring formula matches ARCHITECTURE.md exactly: 0.40×relevance + 0.25×recency + 0.20×usefulness + 0.10×confidence + 0.05×access_frequency
- Greedy knapsack correctly packs memories by score until token budget exhausted
- Context assembly service timeout returns directive-only context (graceful degradation)
- Output format: directive → procedural → declarative → episodic → episodic → conversation history
- Proxy wired to call context assembly before every LLM forward
Milestones: 3.5.1 → 3.5.2 → 3.5.3 → 3.5.4 → 3.5.5 → 3.5.6 → 3.5.7
Goal 3.6: Management API Server
What: The REST API that operators use to manage every aspect of the platform: orgs, users, agents, tokens, directives, memories, sessions, and usage analytics.
Acceptance criteria:
- OpenAPI spec auto-generated from FastAPI; accessible at
/docs - Auth middleware: validates IBEX PAT via call to auth gRPC service
- All endpoints return the stable IBEX error envelope
- Pagination: all list endpoints use cursor-based pagination
- Org management: create, update, suspend, delete
- Agent management: full CRUD + status transitions
- Token management: create, revoke, list (with scoping)
- Directive management: CRUD with full version history
- Memory management: list, get, delete, export (CSV/JSON)
- Session management: list, get, replay link
- Analytics endpoints: usage by time range, token spend, model breakdown
Milestones: 3.6.1 → 3.6.2 → 3.6.3 → 3.6.4 → 3.6.5 → 3.6.6 → 3.6.7 → 3.6.8
Goal 3.7: MinIO Session Content Archives
What: After a session is completed, its full message content (the conversation) is archived to MinIO (S3-compatible). Postgres stores only metadata; MinIO stores content. This enables session replay, GDPR data exports, and cheaper long-term storage.
Acceptance criteria:
- MinIO bucket
ibex-sessionscreated at service init - Session archive written after
sessions.status = 'completed' - Archive format: newline-delimited JSON (one line per checkpoint)
- Archive path:
{org_id}/{agent_id}/{session_id}/{session_id}.ndjson - Archive write is async (does not block session completion)
- Archive readable via pre-signed URL (15-minute expiry)
- GDPR deletion: archive deleted when org or session is deleted
Milestones: 3.7.1 → 3.7.2 → 3.7.3
Goal 3.8: Operator Dashboard
What: The Next.js 14 web application that gives operators visibility and control over the entire platform: agents, memories, usage analytics, and session history.
Acceptance criteria:
- Authenticates with a PAT (no OAuth in Phase 3)
- Agent list, detail, create, edit, delete
- Memory browser: search, filter by category/date/agent, delete
- Analytics: token spend by time range, request count, latency histogram, model breakdown
- Session list with replay viewer (read from session archive)
- All pages load in < 2 seconds (LCP) with real data
Milestones: 3.8.1 → 3.8.2 → 3.8.3 → 3.8.4 → 3.8.5 → 3.8.6
Goal 3.9: Phase 3 Quality Gate
Milestones: 3.9.1 → 3.9.2 → 3.9.3
Last updated on