Milestone 2.2.1 introduces an LRU cache with a 30-second maximum TTL. This means a revoked token can be used for up to 30 seconds after revocation. For Phase 2, the SLA is \"revoked tokens rejected within 5 seconds.\" To meet this, the auth service must notify all proxy instances immediately when a token is revoked, so t
Milestone 2.2.2 — Token Revocation Propagation via Redis Pub/Sub
Status: Planned
Goal: 2.2 — Auth performance cache
Phase: 2 — Single Provider End-to-End
Estimated effort: 2 days
ADR required: ADR-0026 — Revocation propagation design
Why This Milestone Exists
Milestone 2.2.1 introduces an LRU cache with a 30-second maximum TTL. This means a revoked token can be used for up to 30 seconds after revocation. For Phase 2, the SLA is "revoked tokens rejected within 5 seconds." To meet this, the auth service must notify all proxy instances immediately when a token is revoked, so they can invalidate their LRU entries.
Redis pub/sub is the correct mechanism: it is already a required dependency (rate limiting), it supports fan-out to multiple proxy instances, and it does not require any new infrastructure.
The auth service publishes a revocation event when RevokeToken is called. Every proxy instance subscribes and calls CachingValidator.Invalidate when it receives the event.
Branch
feature/m2-2-2-revocation-propagation
PR Title
feat(proxy,auth): token revocation propagation via Redis pub/sub (m2.2.2)
Deliverables
1. Revocation event schema
// RevocationEvent is published to the Redis channel on token revocation.
// Channel: ibex:token:revocations (global, not org-scoped: proxy doesn't know org_id from token alone)
type RevocationEvent struct {
Version int `json:"v"` // schema version, currently 1
TokenHash string `json:"token_hash"` // SHA-256 hex of the raw token
RevokedAt time.Time `json:"revoked_at"`
OrgID uuid.UUID `json:"org_id"` // for audit logging only
}2. Auth service — publisher
// In services/auth/internal/service/token_service.go,
// after successfully marking the token revoked in Postgres:
hash := sha256.Sum256([]byte(rawToken))
event := RevocationEvent{
Version: 1,
TokenHash: hex.EncodeToString(hash[:]),
RevokedAt: time.Now().UTC(),
OrgID: orgID,
}
payload, _ := json.Marshal(event)
// Publish non-blocking; revocation is durable in Postgres even if Redis is down
go func() {
if err := redis.Publish(ctx, "ibex:token:revocations", payload).Err(); err != nil {
log.WarnCtx(ctx, "revocation event publish failed; LRU will expire naturally", "error", err)
}
}()3. Proxy service — subscriber goroutine
// In services/proxy/cmd/proxy/main.go, start the subscriber after CachingValidator is created:
revSub := revocation.NewSubscriber(redisClient, cachingValidator, log)
sd.Register(func(ctx context.Context) error {
revSub.Stop()
return nil
})
go revSub.Run(ctx) // blocks until stopped
// packages/revocation/subscriber.go
type Subscriber struct {
redis *redis.Client
cache *authcache.CachingValidator
log *logger.Logger
stopCh chan struct{}
}
func (s *Subscriber) Run(ctx context.Context) {
pubsub := s.redis.Subscribe(ctx, "ibex:token:revocations")
defer pubsub.Close()
for {
select {
case msg := <-pubsub.Channel():
var event RevocationEvent
if err := json.Unmarshal([]byte(msg.Payload), &event); err != nil {
s.log.WarnCtx(ctx, "malformed revocation event", "error", err)
continue
}
s.cache.Invalidate(event.TokenHash)
case <-s.stopCh:
return
case <-ctx.Done():
return
}
}
}Acceptance Criteria
- Token revoked via
RevokeTokengRPC → pub/sub event published within 100ms - All proxy instances subscribed to channel receive the event within 500ms
- LRU entry invalidated within 1 second of revocation (end-to-end SLA)
- Redis pub/sub failure does not fail
RevokeToken(Postgres write is durable) - Subscriber goroutine terminates cleanly on SIGTERM (via
packages/shutdown)
Last updated on