Milestone 3.6.7 — Analytics and Session History API

Status: Planned
Goal: 3.6 — Management API server
Phase: 3 — Memory Engine and Operator Platform
Estimated effort: 3–4 days
ADR required: ADR-0043 — ClickHouse query patterns and caching strategy

Why This Milestone Exists

The dashboard needs aggregate usage data: how many tokens were consumed this week, what was the p99 latency yesterday, which agents are most active. These queries are aggregations over the llm_traces ClickHouse table — exactly what ClickHouse is designed for.

The challenge is that ClickHouse queries (even simple ones) take 50–500ms. These endpoints cannot be called on every dashboard load without caching. This milestone implements both the analytics endpoints and a 60-second Redis result cache.

Endpoints

GET /v1/analytics/usage
  ?start=ISO8601&end=ISO8601&agent_id=UUID&granularity=hour|day|week
  → UsageSeries (token counts, request counts by time bucket)

GET /v1/analytics/latency
  ?start=ISO8601&end=ISO8601&agent_id=UUID&percentile=50|95|99
  → LatencySeries (proxy overhead + TTFB by time bucket)

GET /v1/analytics/models
  ?start=ISO8601&end=ISO8601
  → ModelBreakdown (request count and token spend by model)

GET /v1/analytics/errors
  ?start=ISO8601&end=ISO8601&error_code=...
  → ErrorSeries (error counts by code and time)

ClickHouse query example (usage by day)

SQL

SELECT
    toDate(requested_at)                AS day,
    sum(input_tokens)                   AS input_tokens,
    sum(output_tokens)                  AS output_tokens,
    count()                             AS request_count
FROM ibex.llm_traces
WHERE
    org_id = {org_id:UUID}
    AND requested_at BETWEEN {start:DateTime64} AND {end:DateTime64}
    AND status_code = 200
GROUP BY day
ORDER BY day ASC
-- clickhouse-driver parameterised queries; org_id always included

Caching strategy

Python

# Cache key: SHA-256(org_id + endpoint + params)
# TTL: 60 seconds for hourly/daily data; 5 minutes for weekly data
# Redis cache-aside; ClickHouse on miss; result serialised to JSON

Acceptance Criteria

Usage endpoint returns correct token totals verified against known test data in ClickHouse
All analytics queries include org_id in WHERE clause
Redis cache hit serves result without ClickHouse query
Cache TTL: 60s for granularity=hour/day; 300s for granularity=week
ADR-0043 written with ClickHouse query pattern documentation