ADR-0011: Proxy auth gRPC client and middleware
Architecture decision record 0011.
ADR-0011: Proxy auth gRPC client and middleware
- Status: Accepted
- Date: 2026-06-04
- Authors: IBEX Harness team
Context
Milestone 1.1.3 delivered auth ValidateToken (ADR-0007). The proxy skeleton (services/proxy) exposes health/metrics only. Milestone 1.2.1 connects the proxy to auth so protected routes receive org_id and permission context before LLM normalization (1.2.2).
ARCHITECTURE.md describes a future bloom filter + LRU cache pipeline. Phase 2 optional milestone 2.2.1-auth-cache-bloom owns that work; v1 uses remote validation only (SECURITY.md §15: fail closed when validation cannot complete).
Decision
1) Transport and connection
- gRPC to
ibex.auth.v1.AuthService/ValidateToken(ADR-0006) - Single shared
*grpc.ClientConnper proxy process; dial at startup; close on shutdown - Development: insecure credentials; production: mTLS (documented follow-up, not in v1)
2) Timeouts
- Default per-validate timeout: 50ms (
IBEX_AUTH_VALIDATE_TIMEOUT) - Use
context.WithTimeoutderived from the HTTP request context - Exceeded deadline → HTTP 503
SERVICE_DEGRADED(fail closed)
3) Bearer parsing
- Read
Authorization: Bearer <token> - Strip the
Bearerprefix and following space; pass PAT wire string (ibex_pat_...) asValidateTokenRequest.access_token - Missing header → HTTP 401
MISSING_TOKEN - Invalid/revoked → HTTP 401
INVALID_TOKEN(maps gRPCUnauthenticated)
4) Request context
After successful validation, attach to context.Context:
org_id,permissions(int64), optionalagent_id,user_id,token_id
Handlers read via auth.FromContext(ctx).
5) Permission and tenant checks
- Chat routes require
permissions.ProxyChatCompletion(ADR-0009) - Path-scoped routes (e.g.
/v1/orgs/{org_id}/...) compare pathorg_idto token org → 403 on mismatch
6) HTTP error mapping
Minimal stable JSON envelope in services/proxy/internal/errors/ (extended by milestone 1.2.3):
| Condition | HTTP | code |
|---|---|---|
| Missing Authorization | 401 | MISSING_TOKEN |
| Invalid token | 401 | INVALID_TOKEN |
| Insufficient permissions / org mismatch | 403 | INSUFFICIENT_PERMISSIONS |
| Auth unreachable / timeout / internal | 503 | SERVICE_DEGRADED |
7) Auth validation cache (deferred — deferral record)
What is deferred: Redis bloom filter for fast rejection, in-process LRU cache for validated claims, Redis validated-token cache.
Why Phase 1 skips it:
- Phase 1 exit prioritizes correctness and fail-closed behavior before latency optimization.
- SECURITY.md §15: deny access when validation cannot complete and no safe cached claims exist.
- Revocation propagation and cache invalidation are non-trivial (see TESTING_STRATEGY auth cache cases).
- Proxy has no Redis auth-cache wiring in Phase 1.
- Provider forwarding is not live yet; per-request gRPC validation is acceptable for integration.
Why not a partial cache: A negative cache without bloom risks false rejects; serving stale claims after revoke violates fail-closed unless a full invalidation story exists.
Extension point: auth.TokenValidator interface; GRPCValidator today; Phase 2 optional 2.2.1-auth-cache-bloom adds a CachingValidator decorator.
When implemented: Phase 2 optional milestone 2.2.1-auth-cache-bloom (after Goal 1.2, before/at provider scale).
Risk accepted: Every protected request hits auth gRPC (~50ms budget per ADR-0011) until 2.2.1.
8) Observability
- Metrics:
ibex_proxy_auth_validate_total,ibex_proxy_auth_validate_duration_seconds - Label:
resultonly (ok,unauthenticated,error) — noorg_id - Logs: may include
org_id,token_idafter success; never log bearer or access_token
9) Middleware order
metrics → logging → auth → handler (future: body limit, rate limit before handler)
10) Public routes (no auth)
/health, /ready, /metrics remain unauthenticated.
Consequences
Positive
- Phase 1 exit criterion: proxy rejects unauthenticated traffic
- Clean extension point for auth cache in Phase 2
Negative
- Every protected request hits auth gRPC (latency until cache milestone)
- 50ms budget may require tuning under load
References
Was this page helpful?
Last updated on