Phase 3 memory engine

Phase 3 — Memory Engine and Operator Platform — Decision Log

Python stack choices and ADR decisions for the Memory Engine and Operator Platform phase.

Phase 3 — Decision Log

Quick decisions during Phase 3. Promote durable choices to docs/adr/ and the public ADR index when they affect multiple phases.

Python stack (non-negotiable)

ConcernChoiceReason
Python version3.11Performance over 3.10; 3.12 not yet stable across ML libs
Web frameworkFastAPI 0.110+Native async, automatic OpenAPI, DI
ORMSQLAlchemy 2.0 (async)Type-safe async queries, pgvector support
MigrationsAlembicStandard SQLAlchemy companion
Configpydantic-settings v2Typed, validated .env support
Testingpytest + pytest-asyncioasyncio_mode="auto"
Lintingruff + mypy --strictSingle tool, fast
Task queueCelery 5 + Redis brokerIndustry standard
gRPCgrpcio + betterprotoDataclass codegen, mypy-friendly
HTTP clienthttpx (async)Better typing than aiohttp
Embeddingssentence-transformers 2.xHuggingFace model flexibility
Packagingpyproject.toml + uvFast installs

ADR register (Phase 3)

ADRTopicMilestoneStatus
ADR-0032Memory data model3.1.1Pending
ADR-0033Embedding service design3.2.1Pending
ADR-0034Memory write pipeline + PII3.3.2Pending
ADR-0035Vector search / IVFFlat tuning3.3.4Pending
ADR-0036Memory extraction strategy3.4.3Pending
ADR-0037Conflict detection + resolution3.4.5Pending
ADR-0038Context assembly gRPC contract3.5.1Pending
ADR-0039Token budget calculator3.5.2Pending
ADR-0040Management API auth middleware3.6.1Pending
ADR-0041MinIO session archive format3.7.1Pending

Log pivots in findings. When an ADR merges, add it under docs/adr.

Architectural decisions

DecisionRationale
Separate Python context assembly service (not inline in Go proxy)Memory ranking needs NumPy, tiktoken, complex caching — wrong language for hot-path Go binary
gRPC for context assembly (not HTTP)Typed contract, lower overhead than JSON for <50ms target; matches auth pattern
Graceful degradation on context timeoutIf assembly exceeds ~45ms, proxy continues with directive-only context
betterproto for Python gRPCDataclass stubs vs protobuf message objects
Celery + Redis for workersAt-least-once delivery; idempotent tasks required
MinIO for session archivesPostgres metadata only; bulk conversation content in object storage
PAT auth for dashboard (Phase 3)OAuth deferred; operators use same token model as SDK

Pending decisions (resolve during milestones)

  1. IVFFlat probes vs recall — default lists=100, probes=10; tune in 3.3.4 load tests.
  2. Embedding GPU in dev compose — CPU default; document optional GPU profile in ENVIRONMENT_VARIABLES.md.
  3. Dashboard chart library — Recharts vs Tremor; lock in 3.8.5 before analytics pages multiply.
Edit on GitHub

Last updated on

On this page

0%