ibexharness
DocsBlogReleasesRoadmap
GitHub
ibexharness

Documentation

Architecture Decision RecordsADR-0002: Repository foundation bootstrapADR-0003: Branch protection and merge policyADR-0004: Protobuf and code generation policyADR-0005: Postgres migration strategyADR-0006: Auth protobuf contract (`ibex.auth.v1`)ADR-0007: Auth token validation implementationADR-0008: Security scanning and CI quality gatesADR-0009: Permission bitmap layoutADR-0010: Cryptography policyADR-0011: Proxy auth gRPC client and middlewareADR-0012: Proxy request normalization (OpenAI chat)ADR-0013: Proxy input validation and stable error envelopeADR-0014: Core domain migration sequencingADR-0015: Proxy rate limit skeleton (Phase 1)ADR-0016: Proxy agent identity verification (Phase 1)ADR-0017: Request ID and trace context strategy (Phase 1)ADR-0018: Graceful shutdown contract (Phase 1)ADR-0019: OpenTelemetry provider configuration (Phase 1)ADR-0020: Shared package boundaries — `packages/config` and `packages/apierror`ADR-0021: Prometheus Metric Catalog (Phase 1)ADR-0022: Health check contract (Phase 1)ADR-0023: Docs site architecture (Phase 1.5)
ADRs›ADR-0018: Graceful shutdown contract (Phase 1)
ADRs

ADR-0018: Graceful shutdown contract (Phase 1)

Architecture decision record 0018.

ADR-0018: Graceful shutdown contract (Phase 1)

  • Status: Accepted
  • Date: 2026-06-07
  • Authors: IBEX Harness team

Context

Auth and proxy services previously used ad-hoc signal.Notify handlers with a hardcoded 10-second drain timeout. Kubernetes rolling updates send SIGTERM; without configurable, ordered shutdown, in-flight HTTP and gRPC work can be dropped during deploys.

M1.3.1 will register OTel provider shutdown on the same coordinator. Phase 2 long-lived streams may need longer drain windows.

Decision

1) Shared packages/shutdown.Coordinator

Both services/auth and services/proxy use shutdown.Coordinator:

  • Register(fn) — handlers run in registration order on shutdown signal
  • Wait() — blocks until SIGTERM or SIGINT, runs handlers with a shared drain context

Ad-hoc signal handling in main.go is forbidden (29-ibex-packages.mdc).

2) Signal semantics

SignalBehavior
SIGTERMGraceful drain within IBEX_SHUTDOWN_TIMEOUT (default 30s)
SIGINTImmediate shutdown — zero drain timeout (development convenience)

3) Environment variable

IBEX_SHUTDOWN_TIMEOUT — Go duration string (e.g. 30s, 60s). Default: 30s.

Canonical name per 24-config-management.mdc. Documented in service .env.example files and ENVIRONMENT_VARIABLES.md.

4) Shutdown sequences

Proxy:

SIGTERM/SIGINT → http.Server.Shutdown → auth gRPC conn Close → Redis Close → exit

Auth:

SIGTERM/SIGINT → gRPC GracefulStop (with Stop fallback on timeout) → http.Server.Shutdown → db.Close → exit

gRPC GracefulStop runs in a goroutine; if the drain context expires, grpc.Server.Stop() forces termination.

5) Exit codes

OutcomeExit code
All handlers completed within drain window0
Drain timeout exceeded1

Handler errors are logged but do not change exit code unless the drain deadline is exceeded.

6) Deferred

  • OTel Providers.Shutdown registration (M1.3.1)
  • ClickHouse writer flush (Phase 2)
  • WebSocket / hijacked connection drain (Phase 2 — note in risks)

Consequences

Positive

  • Single coordinator for all Phase 1 services and future shutdown hooks
  • K8s-friendly SIGTERM drain with configurable timeout
  • Auth DB closed after HTTP/gRPC drain (not at process start via defer)

Negative

  • SIGINT immediate shutdown may drop in-flight requests in local dev (acceptable trade-off)

References

  • Milestone 1.2.7
  • Milestone 1.3.1
  • 29-ibex-packages.mdc

Was this page helpful?

Edit on GitHub

Last updated on

PreviousADR-0017: Request ID and trace context strategy (Phase 1)NextADR-0019: OpenTelemetry provider configuration (Phase 1)

On this page

  • Context
  • Decision
  • 1) Shared packages/shutdown.Coordinator
  • 2) Signal semantics
  • 3) Environment variable
  • 4) Shutdown sequences
  • 5) Exit codes
  • 6) Deferred
  • Consequences
  • Positive
  • Negative
  • References
0%