Request lifecycle
End-to-end flow for a protected proxy request — auth, agent verify, rate limits, and Phase 2 forwarding.
A protected LLM request enters the proxy as an OpenAI-compatible HTTP call. Before any provider handoff, the proxy validates credentials, confirms agent identity, and enforces rate limits. In Phase 1 the pipeline stops after body normalization and returns 501 PROVIDER_NOT_CONFIGURED — a successful 501 means auth and agent checks passed.
/v1/orgs/{org_id}/chat/completionsOpenAI-compatible chat completions. Phase 1 returns 501 after auth succeeds; Phase 2 forwards to a registered provider adapter.
Required headers
| Header | Required | Notes |
|---|---|---|
Authorization | Yes | Bearer + PAT (ibex_pat_...) |
X-IBEX-Agent-ID | Yes | UUID; must belong to {org_id} in path |
Content-Type | Yes (POST) | application/json |
X-Request-ID | No | UUID v7; generated if absent |
Lifecycle steps
Request ID assigned
Middleware assigns or validates X-Request-ID (UUID v7) and injects it into the request context for logs, metrics, and gRPC metadata propagation.
Bearer token validated
Proxy calls auth ValidateToken over gRPC with a 50ms deadline. On success, org_id, permissions, and token_id attach to context. Missing token → 401; auth down → 503 fail-closed per ADR-0011.
Agent identity verified
Proxy requires X-IBEX-Agent-ID and confirms the agent belongs to the org in the URL. Cross-org or unknown agent → 403 before the body is read.
Rate limit checked
Redis sliding-window counter keyed by org_id. Exceeded → 429 with Retry-After. Redis unavailable → fail-open with conservative local limits and audit warning.
Body normalized
JSON parsed and validated against the OpenAI chat schema. Malformed input → 400 with stable error envelope including request_id.
Provider handoff (Phase 2+)
Context assembly, memory injection, and streaming forward to the LLM provider. Phase 1 stops here with 501 PROVIDER_NOT_CONFIGURED.
Sequence diagram
Dashed Phase 2 steps (context retrieval, provider streaming, async jobs) are specified in engineering docs but not executed in the current release.
Phase 1 probe
curl -s -w "\nHTTP %{http_code}\n" \
-X POST "http://localhost:8080/v1/orgs/${IBEX_DEV_ORG_ID}/chat/completions" \
-H "Authorization: Bearer ${IBEX_DEV_TOKEN}" \
-H "X-IBEX-Agent-ID: ${IBEX_DEV_AGENT_ID}" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"ping"}]}'Expected: HTTP 501 with PROVIDER_NOT_CONFIGURED — confirms token validation, agent verify, rate limiting, and normalization all succeeded.
Error mapping
| Condition | HTTP | Code |
|---|---|---|
Missing Authorization | 401 | MISSING_TOKEN |
| Invalid or revoked PAT | 401 | INVALID_TOKEN |
| Agent not in org / path mismatch | 403 | INSUFFICIENT_PERMISSIONS |
| Rate limit exceeded | 429 | RATE_LIMIT_EXCEEDED |
| Auth gRPC timeout or unavailable | 503 | SERVICE_DEGRADED |
| No provider configured | 501 | PROVIDER_NOT_CONFIGURED |
Full envelope: API errors.
Target path (Phase 2+)
Once provider adapters and context assembly ship, the synchronous path extends:
- Parallel context retrieval (40ms deadline) — directive from Redis, hot memories, recent session history
- Context assembly gRPC — rank and pack memories within model token budget
- Provider forward — augment messages, stream response to client while accumulating for async extraction
- Async side effects — ClickHouse trace, memory extraction job, session heartbeat update
Auth validation in Phase 2 may add an optional bloom filter + LRU cache (ADR-0011 deferral record); Phase 1 always calls gRPC.
Architecture decisions
| Topic | ADR |
|---|---|
| Proxy → auth gRPC client | ADR-0011 |
| Token validation contract | ADR-0007 |
| Permission bitmap | ADR-0009 |
| Rate limit skeleton | ADR-0015 |
| Agent identity verification | ADR-0016 |
| Request ID propagation | ADR-0017 |
Related
Was this page helpful?
Last updated on