Milestone 2.6.1 — Proxy Overhead Latency Benchmark and Load Test

Status: Planned
Goal: 2.6 — Phase 2 quality gate
Phase: 2 — Single Provider End-to-End
Estimated effort: 2–3 days
ADR required: ADR-0031 — Performance measurement methodology

Why This Milestone Exists

ARCHITECTURE.md commits to a <20ms proxy overhead target at p99. This commitment is meaningless without a measurement methodology and a benchmark that CI can run to detect regressions. This milestone defines what "proxy overhead" means precisely, how it is measured, and what the Phase 2 baseline is.

Definition of proxy overhead: The time between receiving the first byte of the client request and sending the first byte of the response header — with the LLM provider call replaced by a mock that returns immediately. This isolates the IBEX-specific latency from LLM latency (which is not in our control).

// benchmarks/proxy_overhead_test.go
//go:build benchmark
 
// BenchmarkProxyOverhead measures proxy overhead with all Phase 2 middleware:
// auth (LRU cache hit), agent verification (cache hit), rate limit (Redis),
// directive resolve (Redis cache hit), and a mock provider that returns immediately.
//
// Run:
//   go test -bench=BenchmarkProxyOverhead -benchmem -benchtime=10s ./benchmarks/
//   -count=5 to get variance estimate
//
// Target: p99 < 20ms at 100 concurrent goroutines.
func BenchmarkProxyOverhead(b *testing.B) { ... }

2. Load test with `k6`

JavaScript

// benchmarks/k6/proxy_load.js
// Runs a 2-minute load test at 100 concurrent virtual users.
// Reports: p50, p95, p99, p999 latency; request rate; error rate.
// CI fails if p99 > 20ms (proxy overhead, not including mock provider time).
export const options = {
    vus: 100,
    duration: '2m',
    thresholds: {
        http_req_duration: ['p(99)<20'],  // 99th percentile < 20ms
        http_req_failed:   ['rate<0.001'], // error rate < 0.1%
    },
};

3. Latency stage breakdown

The benchmark reports per-stage latency as Prometheus histograms:

Stage	Metric
Auth (total)	`ibex_proxy_auth_duration_seconds`
Auth cache hit	`ibex_auth_cache_hits_total{tier="lru"}`
Rate limit	`ibex_proxy_rate_limit_checked_total`
Directive resolve	`ibex_proxy_directive_resolve_duration_seconds`
Provider (mock)	`ibex_proxy_provider_duration_seconds`
Total overhead	`ibex_proxy_request_duration_seconds`

Acceptance Criteria

Benchmark runs with a mock provider (no real OpenAI calls)
p99 proxy overhead < 20ms at 100 concurrent requests
Stage breakdown metrics exported and visible
Regression baseline committed to benchmarks/BASELINE.md
CI fails if p99 exceeds baseline by more than 20%
ADR-0031 written defining "proxy overhead" precisely

Milestone 2.6.1 — Proxy Overhead Latency Benchmark and Load Test

Why This Milestone Exists

Branch

PR Title

Deliverables

1. Go benchmark

2. Load test with `k6`

3. Latency stage breakdown

Acceptance Criteria