phase 2 single provider

ARCHITECTURE.md commits to a <20ms proxy overhead target at p99. This commitment is meaningless without a measurement methodology and a benchmark that CI can run to detect regressions. This milestone defines what \"proxy overhead\" means precisely, how it is measured, and what the Phase 2 baseline is. Definition of pro

Milestone 2.6.1 — Proxy Overhead Latency Benchmark and Load Test

Status: Planned
Goal: 2.6 — Phase 2 quality gate
Phase: 2 — Single Provider End-to-End
Estimated effort: 2–3 days
ADR required: ADR-0031 — Performance measurement methodology


Why This Milestone Exists

ARCHITECTURE.md commits to a <20ms proxy overhead target at p99. This commitment is meaningless without a measurement methodology and a benchmark that CI can run to detect regressions. This milestone defines what "proxy overhead" means precisely, how it is measured, and what the Phase 2 baseline is.

Definition of proxy overhead: The time between receiving the first byte of the client request and sending the first byte of the response header — with the LLM provider call replaced by a mock that returns immediately. This isolates the IBEX-specific latency from LLM latency (which is not in our control).


Branch

test/m2-6-1-latency-benchmark

PR Title

test(perf): proxy overhead benchmark and load test baseline (m2.6.1)


Deliverables

1. Go benchmark

Go
// benchmarks/proxy_overhead_test.go
//go:build benchmark
 
// BenchmarkProxyOverhead measures proxy overhead with all Phase 2 middleware:
// auth (LRU cache hit), agent verification (cache hit), rate limit (Redis),
// directive resolve (Redis cache hit), and a mock provider that returns immediately.
//
// Run:
//   go test -bench=BenchmarkProxyOverhead -benchmem -benchtime=10s ./benchmarks/
//   -count=5 to get variance estimate
//
// Target: p99 < 20ms at 100 concurrent goroutines.
func BenchmarkProxyOverhead(b *testing.B) { ... }

2. Load test with k6

JavaScript
// benchmarks/k6/proxy_load.js
// Runs a 2-minute load test at 100 concurrent virtual users.
// Reports: p50, p95, p99, p999 latency; request rate; error rate.
// CI fails if p99 > 20ms (proxy overhead, not including mock provider time).
export const options = {
    vus: 100,
    duration: '2m',
    thresholds: {
        http_req_duration: ['p(99)<20'],  // 99th percentile < 20ms
        http_req_failed:   ['rate<0.001'], // error rate < 0.1%
    },
};

3. Latency stage breakdown

The benchmark reports per-stage latency as Prometheus histograms:

StageMetric
Auth (total)ibex_proxy_auth_duration_seconds
Auth cache hitibex_auth_cache_hits_total{tier="lru"}
Rate limitibex_proxy_rate_limit_checked_total
Directive resolveibex_proxy_directive_resolve_duration_seconds
Provider (mock)ibex_proxy_provider_duration_seconds
Total overheadibex_proxy_request_duration_seconds

Acceptance Criteria

  • Benchmark runs with a mock provider (no real OpenAI calls)
  • p99 proxy overhead < 20ms at 100 concurrent requests
  • Stage breakdown metrics exported and visible
  • Regression baseline committed to benchmarks/BASELINE.md
  • CI fails if p99 exceeds baseline by more than 20%
  • ADR-0031 written defining "proxy overhead" precisely

Edit on GitHub

Last updated on

On this page

0%