ARCHITECTURE.md commits to a <20ms proxy overhead target at p99. This commitment is meaningless without a measurement methodology and a benchmark that CI can run to detect regressions. This milestone defines what \"proxy overhead\" means precisely, how it is measured, and what the Phase 2 baseline is. Definition of pro
Milestone 2.6.1 — Proxy Overhead Latency Benchmark and Load Test
Status: Planned
Goal: 2.6 — Phase 2 quality gate
Phase: 2 — Single Provider End-to-End
Estimated effort: 2–3 days
ADR required: ADR-0031 — Performance measurement methodology
Why This Milestone Exists
ARCHITECTURE.md commits to a <20ms proxy overhead target at p99. This commitment is meaningless without a measurement methodology and a benchmark that CI can run to detect regressions. This milestone defines what "proxy overhead" means precisely, how it is measured, and what the Phase 2 baseline is.
Definition of proxy overhead: The time between receiving the first byte of the client request and sending the first byte of the response header — with the LLM provider call replaced by a mock that returns immediately. This isolates the IBEX-specific latency from LLM latency (which is not in our control).
Branch
test/m2-6-1-latency-benchmark
PR Title
test(perf): proxy overhead benchmark and load test baseline (m2.6.1)
Deliverables
1. Go benchmark
// benchmarks/proxy_overhead_test.go
//go:build benchmark
// BenchmarkProxyOverhead measures proxy overhead with all Phase 2 middleware:
// auth (LRU cache hit), agent verification (cache hit), rate limit (Redis),
// directive resolve (Redis cache hit), and a mock provider that returns immediately.
//
// Run:
// go test -bench=BenchmarkProxyOverhead -benchmem -benchtime=10s ./benchmarks/
// -count=5 to get variance estimate
//
// Target: p99 < 20ms at 100 concurrent goroutines.
func BenchmarkProxyOverhead(b *testing.B) { ... }2. Load test with k6
// benchmarks/k6/proxy_load.js
// Runs a 2-minute load test at 100 concurrent virtual users.
// Reports: p50, p95, p99, p999 latency; request rate; error rate.
// CI fails if p99 > 20ms (proxy overhead, not including mock provider time).
export const options = {
vus: 100,
duration: '2m',
thresholds: {
http_req_duration: ['p(99)<20'], // 99th percentile < 20ms
http_req_failed: ['rate<0.001'], // error rate < 0.1%
},
};3. Latency stage breakdown
The benchmark reports per-stage latency as Prometheus histograms:
| Stage | Metric |
|---|---|
| Auth (total) | ibex_proxy_auth_duration_seconds |
| Auth cache hit | ibex_auth_cache_hits_total{tier="lru"} |
| Rate limit | ibex_proxy_rate_limit_checked_total |
| Directive resolve | ibex_proxy_directive_resolve_duration_seconds |
| Provider (mock) | ibex_proxy_provider_duration_seconds |
| Total overhead | ibex_proxy_request_duration_seconds |
Acceptance Criteria
- Benchmark runs with a mock provider (no real OpenAI calls)
- p99 proxy overhead < 20ms at 100 concurrent requests
- Stage breakdown metrics exported and visible
- Regression baseline committed to
benchmarks/BASELINE.md - CI fails if p99 exceeds baseline by more than 20%
- ADR-0031 written defining "proxy overhead" precisely
Last updated on