The config problem: Both `services/auth` and `services/proxy` currently call `os.Getenv()` directly, scattered across multiple source files. There is no validation that required variables are set. A service that starts with `POSTGRES_DSN=\"\"` will crash at runtime when handling the first request, not at startup with
Milestone 1.4.2 — Shared Infrastructure Packages: packages/config and packages/apierror
Status: Planned
Goal: 1.4 — Developer Experience Baseline
Phase: 1 — Core Platform
Estimated effort: 2–3 days
ADR required: ADR-0020 — Shared package boundaries
Why This Milestone Exists
The config problem: Both services/auth and services/proxy currently call os.Getenv() directly, scattered across multiple source files. There is no validation that required variables are set. A service that starts with POSTGRES_DSN="" will crash at runtime when handling the first request, not at startup with a clear diagnostic message. There is no single source of truth for what environment variables a service needs — instead, developers discover them one by one from crash logs.
The error code problem: M1.2.3 introduced the stable error envelope with codes like INVALID_JSON, MISSING_AGENT_ID, and RATE_LIMITED. These codes are currently string literals scattered across handler files. There is nothing preventing a developer from typing "INVLAID_JSON" (typo) in a new handler and shipping an inconsistent error code to clients. There is no canonical registry that documents all valid codes, their HTTP status mappings, and their intended semantics.
Both problems compound as more services are added. Phase 2 adds a Python API service. Without a canonical error code registry, Python and Go services will independently invent error codes that diverge. Without shared config patterns, each new service will reinvent env loading.
Non-Goals
- Feature flags or runtime config hot-reload (Phase 4)
- A secrets manager integration (Vault, AWS SSM — Phase 3)
- Centralised config server (Consul, etcd — not planned)
- Python config package (Python uses
pydantic-settings; the pattern is documented in07-python-code-quality.mdc)
Branch
chore/m1-4-2-shared-config-and-error-packages
PR Title
chore(infra): packages/config and packages/apierror shared infrastructure (m1.4.2)
Prerequisites
- 1.2.3 merged
Deliverables
1. packages/config — typed, validated environment loading
// Package config provides typed, validated environment variable loading
// for IBEX Harness Go services.
//
// Usage:
// type MyServiceConfig struct {
// DatabaseURL string `env:"POSTGRES_DSN" required:"true"`
// ListenAddr string `env:"IBEX_PROXY_ADDR" envDefault:":8080"`
// MaxConns int `env:"IBEX_DB_MAX_CONNS" envDefault:"20"`
// Timeout time.Duration `env:"IBEX_SHUTDOWN_TIMEOUT" envDefault:"30s"`
// }
// cfg, err := config.Load[MyServiceConfig]()
// // err describes every missing/invalid variable; not just the first one
//
// The Load function:
// 1. Reads all struct fields tagged with `env:"VAR_NAME"`
// 2. Parses according to field type (string, int, bool, time.Duration, url.URL, uuid.UUID)
// 3. Applies defaults from `envDefault` tags
// 4. Returns all validation errors at once (not just the first)
// 5. Logs the resolved config at DEBUG level (redacting values for fields tagged `secret:"true"`)
package config
// Load parses environment variables into T and validates all required fields.
// Returns a descriptive error listing ALL missing/invalid variables — not just the first.
// Call in main() and Fatal on error; do not ignore the error.
func Load[T any]() (T, error)
// MustLoad is Load with a fatal log on error. Use in main() only.
func MustLoad[T any]() T
// Secret marks a field value as sensitive. Its value is redacted in debug logs.
// Apply to fields that hold tokens, passwords, or API keys.
// APIKey string `env:"OPENAI_API_KEY" required:"true" secret:"true"`
type Secret stringAdopt github.com/caarlos0/env/v11 as the underlying parser — it is well-maintained, supports generics, and handles all required types. Do NOT use Viper (too heavy), godotenv (env files are not our pattern), or raw os.Getenv.
2. packages/apierror — canonical error code registry
// Package apierror defines all error codes used in IBEX Harness HTTP and
// gRPC APIs. It is the single source of truth for:
// - Error code string values (used in the JSON error envelope)
// - HTTP status code mappings
// - Human-readable default messages
// - gRPC status code mappings
//
// Rules:
// 1. New error codes are added ONLY in this package.
// 2. No service or package uses raw string literals for error codes.
// 3. Every code has a corresponding HTTP status and gRPC status.
// 4. Error messages are safe to return to API callers (no internal details).
package apierror
// Code is a canonical IBEX error code string.
// Values are UPPER_SNAKE_CASE and stable across API versions.
type Code string
// ═════════════════════════════════════════════════════════════
// Client error codes (4xx)
// ═════════════════════════════════════════════════════════════
const (
// CodeInvalidJSON is returned when the request body cannot be parsed as JSON.
// HTTP 400 | gRPC INVALID_ARGUMENT
CodeInvalidJSON Code = "INVALID_JSON"
// CodeInvalidRequest is returned when a request field fails validation.
// HTTP 400 | gRPC INVALID_ARGUMENT
CodeInvalidRequest Code = "INVALID_REQUEST"
// CodeMissingAgentID is returned when X-IBEX-Agent-ID header is absent.
// HTTP 400 | gRPC INVALID_ARGUMENT
CodeMissingAgentID Code = "MISSING_AGENT_ID"
// CodeProviderNotConfigured is returned when no LLM provider is configured
// for the requested model. HTTP 400 | gRPC FAILED_PRECONDITION
CodeProviderNotConfigured Code = "PROVIDER_NOT_CONFIGURED"
// CodeUnauthenticated is returned for missing, invalid, expired, or revoked tokens.
// HTTP 401 | gRPC UNAUTHENTICATED
CodeUnauthenticated Code = "UNAUTHENTICATED"
// CodePermissionDenied is returned for valid tokens with insufficient permissions,
// and for all cross-tenant resource access attempts.
// HTTP 403 | gRPC PERMISSION_DENIED
CodePermissionDenied Code = "PERMISSION_DENIED"
// CodeAgentNotAuthorized is returned when X-IBEX-Agent-ID refers to an agent
// that does not exist in the authenticated org, or belongs to another org.
// HTTP 403 | gRPC PERMISSION_DENIED
CodeAgentNotAuthorized Code = "AGENT_NOT_AUTHORIZED"
// CodeAgentSuspended is returned when the agent exists and belongs to the org
// but its status is paused, suspended, or archived.
// HTTP 403 | gRPC PERMISSION_DENIED
CodeAgentSuspended Code = "AGENT_SUSPENDED"
// CodeRateLimited is returned when the org-level rate limit is exceeded.
// HTTP 429 | gRPC RESOURCE_EXHAUSTED
CodeRateLimited Code = "RATE_LIMITED"
)
// ═════════════════════════════════════════════════════════════
// Server error codes (5xx)
// ═════════════════════════════════════════════════════════════
const (
// CodeInternalError is returned for unexpected server errors.
// Callers should not retry without backoff.
// HTTP 500 | gRPC INTERNAL
CodeInternalError Code = "INTERNAL_ERROR"
// CodeAuthUnavailable is returned when the auth gRPC service is unreachable.
// HTTP 503 | gRPC UNAVAILABLE
CodeAuthUnavailable Code = "AUTH_UNAVAILABLE"
// CodeProviderUnavailable is returned when the LLM provider is unreachable (Phase 2+).
// HTTP 503 | gRPC UNAVAILABLE
CodeProviderUnavailable Code = "PROVIDER_UNAVAILABLE"
// CodeProviderTimeout is returned when the LLM provider exceeds its deadline (Phase 2+).
// HTTP 504 | gRPC DEADLINE_EXCEEDED
CodeProviderTimeout Code = "PROVIDER_TIMEOUT"
)
// HTTPStatus returns the HTTP status code for a given error code.
// Returns 500 for unknown codes.
func HTTPStatus(code Code) int
// GRPCCode returns the gRPC status code for a given error code.
func GRPCCode(code Code) codes.Code
// ErrorResponse is the stable JSON error envelope returned by all IBEX HTTP endpoints.
// It matches the envelope introduced in M1.2.3.
type ErrorResponse struct {
Error ErrorDetail `json:"error"`
}
type ErrorDetail struct {
Code Code `json:"code"`
Message string `json:"message"`
RequestID string `json:"request_id"`
}
// New constructs an ErrorResponse with the given code and message.
func New(code Code, message, requestID string) ErrorResponse
// WriteHTTP writes an ErrorResponse as JSON to w with the appropriate HTTP status.
func WriteHTTP(w http.ResponseWriter, code Code, message, requestID string)3. Adopt both packages in existing services
- Replace all
os.Getenvcalls inservices/authwithpackages/configtyped struct - Replace all
os.Getenvcalls inservices/proxywithpackages/configtyped struct - Replace all raw error code strings in both services with
apierror.Code*constants - Replace all inline
{"error":{"code":...}}JSON construction withapierror.WriteHTTP
Files Affected
| Path | Action |
|---|---|
packages/config/config.go | Add |
packages/config/config_test.go | Add |
packages/apierror/apierror.go | Add |
packages/apierror/apierror_test.go | Add |
services/auth/cmd/auth/main.go | Adopt packages/config |
services/proxy/cmd/proxy/main.go | Adopt packages/config |
services/proxy/internal/handler/*.go | Adopt packages/apierror |
services/auth/internal/grpc/*.go | Adopt packages/apierror for gRPC codes |
docs/adr/ADR-0020-shared-package-boundaries.md | Add |
Acceptance Criteria
-
packages/config.Load[T]()reports ALL missing required vars in a single error, not just the first - Service startup fails at
main()with clear message listing missing env vars — not at first request - All
os.Getenvcalls removed from both services - All error code strings replaced with
apierror.Code*constants -
apierror.WriteHTTPis the only way to write error responses in handler code -
packages/configstruct field tags are the single source of truth for env var names (docs generated from struct) - ADR-0020 written and indexed
Last updated on