Rate limiting

The proxy enforces organization-level requests-per-minute (RPM) limits using a Redis sliding window (ADR-0015). Limits apply after authentication and agent verification succeed and before request normalization hands off to a provider adapter.

Rate limiting is a cost control, not a security boundary. Auth and tenant isolation remain fail-closed; the limiter intentionally fails open when Redis is down.

How it works

+------------------------------+                                         
|                              |                                         
|    Authenticated request     |                                         
|                              |                                         
+------------------------------+                                         
                |                                                        
                |                                                        
                |                                                        
                |                                                        
                v                                                        
<------------------------------>                                         
|                              |                                         
|       Redis reachable?       |-----------------------+                 
|                              |                       |                 
<------------------------------>                      Yes                
                |                                      |                 
               No                                      |                 
                |                                      |                 
                |                                      |                 
                v                                      v                 
+------------------------------+     +----------------------------------+
|                              |     |                                  |
|  Allow request — fail-open   |     | INCR ratelimit:org_id:rpm:minute |
|                              |     |                                  |
+------------------------------+     +----------------------------------+
                                                       |                 
                                                       |                 
                                                       |                 
                                                       |                 
                                                       |                 
<------------------------------>                       |                 
|                              |                       |                 
|         count ≤ RPM?         |<----------------------+                 
|                              |                       |                 
<------------------------------>                      No                 
                |                                      |                 
               Yes                                     |                 
                |                                      |                 
                |                                      |                 
                v                                      v                 
+------------------------------+     +----------------------------------+
|                              |     |                                  |
| Allow + set response headers |     |  429 RATE_LIMITED + Retry-After  |
|                              |     |                                  |
+------------------------------+     +----------------------------------+

Implementation lives in packages/ratelimit — services must use the Limiter interface, not ad-hoc Redis calls. Key format: ratelimit:{org_id}:rpm:{unix_minute} with a TTL on first increment.

Configuration

Parameter	Type	Description
`IBEX_RATE_LIMIT_DEFAULT_RPM`	`integer`	Default requests per minute for all orgs without an override. Default: 60
`IBEX_RATE_LIMIT_ORG_OVERRIDES`	`string`	Comma-separated org_uuid=rpm pairs. Example: 550e8400-e29b-41d4-a716-446655440000=120
`REDIS_URL`Required	`string (URL)`	Redis connection used by the limiter and /ready probe.

Example overrides:

bash

IBEX_RATE_LIMIT_DEFAULT_RPM=60
IBEX_RATE_LIMIT_ORG_OVERRIDES=00000000-0000-0000-0000-000000000001=120,00000000-0000-0000-0000-000000000002=30

Org IDs in overrides must be valid UUIDs. The limiter resolves the org from the authenticated token, never from the request body.

Response headers

When rate limiting is active and Redis is healthy, protected responses include:

Header	Meaning
`X-RateLimit-Limit`	Configured RPM for this org
`X-RateLimit-Remaining`	Requests left in the current window
`X-RateLimit-Reset`	Unix timestamp when the window resets

On 429, the proxy also sets Retry-After (seconds) so clients can back off without guessing.

Rate-limited error envelope

JSON

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Rate limit exceeded",
    "request_id": "0192a3b4-c5d6-7890-abcd-ef1234567890",
    "timestamp": "2026-06-14T12:00:00Z"
  }
}

The request_id matches X-Request-ID on the response and is propagated to auth gRPC metadata for log correlation (ADR-0017).

Routes subject to limiting

All protected /v1/* routes share the same org-level bucket in Phase 1:

GET /v1/internal/auth-probe
GET /v1/orgs/{org_id}/auth-probe
POST /v1/chat/completions

There is no per-agent or per-IP tier yet. Hierarchical limits (agent → org → global) are planned for Phase 2+.

Verify the limiter

Start test stack

make compose-test-up — Postgres on 5433; Redis from compose.

Run security tests

go test -tags=integration -run TestSecurity_SEC4 ./services/proxy/...

Optional smoke burst

make dev-smoke may WARN if 429 is not observed in 65 rapid probe requests — acceptable on fast hardware.

CI runs the full security-integration job including Redis fail-open cases. See Testing strategy and current state.

Operational guidance

Staging: Keep IBEX_RATE_LIMIT_DEFAULT_RPM low enough to catch runaway integration tests.
Production: Set per-org overrides for high-traffic tenants; never embed org IDs in metric labels (cardinality explosion).
Security: Pair rate limits with auth — unauthenticated traffic should not reach the limiter on protected routes.

Configuration — full env var reference
Authentication — limiter runs after auth succeeds
Tenant isolation — Redis key namespacing by org_id
ADR-0015 — design rationale and race window

Was this page helpful?

Rate limiting is a cost control, not a security boundary. Auth and tenant isolation remain fail-closed; the limiter intentionally fails open when Redis is down.

How it works

+------------------------------+                                         
|                              |                                         
|    Authenticated request     |                                         
|                              |                                         
+------------------------------+                                         
                |                                                        
                |                                                        
                |                                                        
                |                                                        
                v                                                        
<------------------------------>                                         
|                              |                                         
|       Redis reachable?       |-----------------------+                 
|                              |                       |                 
<------------------------------>                      Yes                
                |                                      |                 
               No                                      |                 
                |                                      |                 
                |                                      |                 
                v                                      v                 
+------------------------------+     +----------------------------------+
|                              |     |                                  |
|  Allow request — fail-open   |     | INCR ratelimit:org_id:rpm:minute |
|                              |     |                                  |
+------------------------------+     +----------------------------------+
                                                       |                 
                                                       |                 
                                                       |                 
                                                       |                 
                                                       |                 
<------------------------------>                       |                 
|                              |                       |                 
|         count ≤ RPM?         |<----------------------+                 
|                              |                       |                 
<------------------------------>                      No                 
                |                                      |                 
               Yes                                     |                 
                |                                      |                 
                |                                      |                 
                v                                      v                 
+------------------------------+     +----------------------------------+
|                              |     |                                  |
| Allow + set response headers |     |  429 RATE_LIMITED + Retry-After  |
|                              |     |                                  |
+------------------------------+     +----------------------------------+

Implementation lives in packages/ratelimit — services must use the Limiter interface, not ad-hoc Redis calls. Key format: ratelimit:{org_id}:rpm:{unix_minute} with a TTL on first increment.

Configuration

Parameter	Type	Description
`IBEX_RATE_LIMIT_DEFAULT_RPM`	`integer`	Default requests per minute for all orgs without an override. Default: 60
`IBEX_RATE_LIMIT_ORG_OVERRIDES`	`string`	Comma-separated org_uuid=rpm pairs. Example: 550e8400-e29b-41d4-a716-446655440000=120
`REDIS_URL`Required	`string (URL)`	Redis connection used by the limiter and /ready probe.

Example overrides:

bash

IBEX_RATE_LIMIT_DEFAULT_RPM=60
IBEX_RATE_LIMIT_ORG_OVERRIDES=00000000-0000-0000-0000-000000000001=120,00000000-0000-0000-0000-000000000002=30

Org IDs in overrides must be valid UUIDs. The limiter resolves the org from the authenticated token, never from the request body.

Response headers

When rate limiting is active and Redis is healthy, protected responses include:

Header	Meaning
`X-RateLimit-Limit`	Configured RPM for this org
`X-RateLimit-Remaining`	Requests left in the current window
`X-RateLimit-Reset`	Unix timestamp when the window resets

On 429, the proxy also sets Retry-After (seconds) so clients can back off without guessing.

Rate-limited error envelope

JSON

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Rate limit exceeded",
    "request_id": "0192a3b4-c5d6-7890-abcd-ef1234567890",
    "timestamp": "2026-06-14T12:00:00Z"
  }
}

The request_id matches X-Request-ID on the response and is propagated to auth gRPC metadata for log correlation (ADR-0017).

Routes subject to limiting

All protected /v1/* routes share the same org-level bucket in Phase 1:

GET /v1/internal/auth-probe
GET /v1/orgs/{org_id}/auth-probe
POST /v1/chat/completions

There is no per-agent or per-IP tier yet. Hierarchical limits (agent → org → global) are planned for Phase 2+.

Verify the limiter

Start test stack

make compose-test-up — Postgres on 5433; Redis from compose.

Run security tests

go test -tags=integration -run TestSecurity_SEC4 ./services/proxy/...

Optional smoke burst

make dev-smoke may WARN if 429 is not observed in 65 rapid probe requests — acceptable on fast hardware.

CI runs the full security-integration job including Redis fail-open cases. See Testing strategy and current state.

Operational guidance

Staging: Keep IBEX_RATE_LIMIT_DEFAULT_RPM low enough to catch runaway integration tests.
Production: Set per-org overrides for high-traffic tenants; never embed org IDs in metric labels (cardinality explosion).
Security: Pair rate limits with auth — unauthenticated traffic should not reach the limiter on protected routes.

Configuration — full env var reference
Authentication — limiter runs after auth succeeds
Tenant isolation — Redis key namespacing by org_id
ADR-0015 — design rationale and race window

Was this page helpful?