Phase 4 implements full hierarchical Redis Lua rate limiting (agent/org/global). Phase 2 mentions \"basic org-level rate limit in Redis (optional milestone).\"
Milestone 1.2.4: Proxy Rate Limit Skeleton
Status: Complete
Goal: 1.2 Proxy platform integration
Phase: 1 — Core Platform
Why This Milestone Exists
Phase 4 implements full hierarchical Redis Lua rate limiting (agent/org/global). Phase 2 mentions "basic org-level rate limit in Redis (optional milestone)." Phase 1 has nothing.
The problem: without any rate limiting, Phase 2 testing against a real LLM provider has no cost protection. A runaway test loop can exhaust API credits. More importantly, the rate limit middleware must be in the proxy's request pipeline before Phase 2 adds provider calls — retrofitting it later requires touching the critical path.
This milestone implements a minimal but real rate limiter:
- Org-level token bucket in Redis
- Configurable limits per org (from config, not DB yet)
- Returns 429 with
Retry-Afterheader - Designed to be extended in Phase 4 without rewriting
Branch
feature/m1-2-4-rate-limit-skeleton
PR Title
feat(proxy): rate limit skeleton (m1.2.4)
Prerequisites
- 1.2.1 merged (org_id available in request context)
Tasks
1. Design the rate limiter interface
// RateLimiter checks and enforces rate limits.
// The interface is designed to support the full Phase 4 implementation
// (Redis Lua scripts, hierarchical limits) without changing callers.
type RateLimiter interface {
// Check checks the rate limit for the given org and agent.
// Returns (allowed bool, retryAfter time.Duration, err error).
// If allowed=false, retryAfter indicates when to retry.
// err is non-nil only for infrastructure failures (Redis down, etc.).
Check(ctx context.Context, orgID, agentID uuid.UUID) (bool, time.Duration, error)
}2. Implement Redis token bucket (Phase 1 version)
// RedisTokenBucket implements a simple token bucket rate limiter using Redis.
// This is the Phase 1 implementation: org-level only, no Lua scripts.
// Phase 4 will replace this with a Lua-based hierarchical implementation
// without changing the RateLimiter interface.
type RedisTokenBucket struct {
client redis.Client
orgLimits map[string]OrgLimit // loaded from config
defaultLimit OrgLimit
}
type OrgLimit struct {
RequestsPerMinute int
BurstSize int
}
// Check uses Redis INCR + EXPIRE for a simple sliding window.
// Key: {org_id}:ratelimit:minute:{unix_minute}
// TTL: 2 minutes (allows for clock skew)
//
// NOTE: This is NOT atomic. Phase 4 will replace with Lua scripts
// for atomic check-and-decrement. This is acceptable for Phase 1
// because the limit is a soft limit (not a billing hard cap).
func (r *RedisTokenBucket) Check(ctx context.Context,
orgID, agentID uuid.UUID) (bool, time.Duration, error)3. Implement rate limit middleware
// RateLimitMiddleware enforces rate limits on all protected routes.
// Returns 429 with Retry-After header when limit exceeded.
// On Redis failure: fail OPEN (allow request) with warning log.
// Rationale: rate limiting is a quality control, not a security control.
// Security (auth) already failed closed. Rate limiting failing open
// is preferable to blocking all traffic when Redis is down.
func RateLimitMiddleware(limiter RateLimiter) func(http.Handler) http.Handler429 response format:
{
"error": {
"code": "RATE_LIMITED",
"message": "Rate limit exceeded for this organization",
"detail": "You have exceeded the request rate limit. Please retry after the indicated time.",
"request_id": "req_7f3k2m9x"
}
}Response headers on 429:
Retry-After: 42
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 17100000004. Wire middleware into proxy router
Place RateLimitMiddleware after auth middleware and before request handlers.
5. Configuration
Load org limits from environment or config file (no DB in Phase 1):
default:
requests_per_minute: 60
burst_size: 106. Tests
- Under limit → 200
- Over limit → 429 with
Retry-Afterand stable error envelope - Redis unavailable → request allowed (fail open) with warning log
Acceptance criteria
- Org-level rate limit enforced via Redis
- 429 responses match stable error envelope
- Middleware ordering documented (auth → rate limit → handler)
- Unit tests for limiter; integration test with Redis optional
Risks
| Risk | Mitigation |
|---|---|
| Non-atomic INCR window | Document as soft limit; Phase 4 Lua scripts |
| Redis outage blocks traffic | Fail open by design |
Last updated on