Milestone 1.2.4: Proxy Rate Limit Skeleton

Status: Complete
Goal: 1.2 Proxy platform integration
Phase: 1 — Core Platform

Why This Milestone Exists

Phase 4 implements full hierarchical Redis Lua rate limiting (agent/org/global). Phase 2 mentions "basic org-level rate limit in Redis (optional milestone)." Phase 1 has nothing.

The problem: without any rate limiting, Phase 2 testing against a real LLM provider has no cost protection. A runaway test loop can exhaust API credits. More importantly, the rate limit middleware must be in the proxy's request pipeline before Phase 2 adds provider calls — retrofitting it later requires touching the critical path.

This milestone implements a minimal but real rate limiter:

Org-level token bucket in Redis
Configurable limits per org (from config, not DB yet)
Returns 429 with Retry-After header
Designed to be extended in Phase 4 without rewriting

1.2.1 merged (org_id available in request context)

Tasks

1. Design the rate limiter interface

// RateLimiter checks and enforces rate limits.
// The interface is designed to support the full Phase 4 implementation
// (Redis Lua scripts, hierarchical limits) without changing callers.
type RateLimiter interface {
    // Check checks the rate limit for the given org and agent.
    // Returns (allowed bool, retryAfter time.Duration, err error).
    // If allowed=false, retryAfter indicates when to retry.
    // err is non-nil only for infrastructure failures (Redis down, etc.).
    Check(ctx context.Context, orgID, agentID uuid.UUID) (bool, time.Duration, error)
}

2. Implement Redis token bucket (Phase 1 version)

// RedisTokenBucket implements a simple token bucket rate limiter using Redis.
// This is the Phase 1 implementation: org-level only, no Lua scripts.
// Phase 4 will replace this with a Lua-based hierarchical implementation
// without changing the RateLimiter interface.
type RedisTokenBucket struct {
    client     redis.Client
    orgLimits  map[string]OrgLimit  // loaded from config
    defaultLimit OrgLimit
}
 
type OrgLimit struct {
    RequestsPerMinute int
    BurstSize         int
}
 
// Check uses Redis INCR + EXPIRE for a simple sliding window.
// Key: {org_id}:ratelimit:minute:{unix_minute}
// TTL: 2 minutes (allows for clock skew)
//
// NOTE: This is NOT atomic. Phase 4 will replace with Lua scripts
// for atomic check-and-decrement. This is acceptable for Phase 1
// because the limit is a soft limit (not a billing hard cap).
func (r *RedisTokenBucket) Check(ctx context.Context,
    orgID, agentID uuid.UUID) (bool, time.Duration, error)

3. Implement rate limit middleware

// RateLimitMiddleware enforces rate limits on all protected routes.
// Returns 429 with Retry-After header when limit exceeded.
// On Redis failure: fail OPEN (allow request) with warning log.
// Rationale: rate limiting is a quality control, not a security control.
// Security (auth) already failed closed. Rate limiting failing open
// is preferable to blocking all traffic when Redis is down.
func RateLimitMiddleware(limiter RateLimiter) func(http.Handler) http.Handler

429 response format:

JSON

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Rate limit exceeded for this organization",
    "detail": "You have exceeded the request rate limit. Please retry after the indicated time.",
    "request_id": "req_7f3k2m9x"
  }
}

Response headers on 429:

Retry-After: 42
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1710000000

4. Wire middleware into proxy router

Place RateLimitMiddleware after auth middleware and before request handlers.

5. Configuration

Load org limits from environment or config file (no DB in Phase 1):

YAML

default:
  requests_per_minute: 60
  burst_size: 10

6. Tests

Under limit → 200
Over limit → 429 with Retry-After and stable error envelope
Redis unavailable → request allowed (fail open) with warning log

Acceptance criteria

Org-level rate limit enforced via Redis
429 responses match stable error envelope
Middleware ordering documented (auth → rate limit → handler)
Unit tests for limiter; integration test with Redis optional

Risks

Risk	Mitigation
Non-atomic INCR window	Document as soft limit; Phase 4 Lua scripts
Redis outage blocks traffic	Fail open by design