The original M1.3.1 scope was too broad: OTel init, span middleware, Prometheus migration, and log propagation in one PR. This milestone is reduced to the correct atomic unit: initialize the OTel tracer and meter providers in both Go services, wire the HTTP request span middleware, and propagate trace context through g
Milestone 1.3.1 — OTel Tracer and Meter Provider Initialization
Status: Complete
Goal: 1.3 — Observability baseline
Phase: 1 — Core Platform
Estimated effort: 2–3 days
ADR required: ADR-0019 — OpenTelemetry provider configuration
Why This Milestone Exists
The original M1.3.1 scope was too broad: OTel init, span middleware, Prometheus migration, and log propagation in one PR. This milestone is reduced to the correct atomic unit: initialize the OTel tracer and meter providers in both Go services, wire the HTTP request span middleware, and propagate trace context through gRPC calls to auth. The Prometheus metric catalog (M1.3.2) and shared logger (M1.3.3) are separate milestones.
Phase 1 does not require a running Jaeger or Tempo instance. The exporter is configured to the OTLP endpoint if OTEL_EXPORTER_OTLP_ENDPOINT is set, and falls back to a no-op exporter if not. The CI test uses the in-process sdktrace/tracetest SDK recorder to assert spans are created — no external collector required.
Non-Goals
- Prometheus metric migration (M1.3.2)
- Shared logger package (M1.3.3)
- ClickHouse trace ingestion (Phase 2)
- Sampling configuration beyond "100% errors, 1% of normal" (Phase 2)
- gRPC server interceptors on the auth service (added when auth gains a full test suite)
Branch
chore/m1-3-1-otel-providers
PR Title
chore(obs): OTel tracer and meter provider init with HTTP span middleware (m1.3.1)
Prerequisites
- 1.2.6 merged — request ID in context (needed by span middleware)
- 1.2.7 merged — graceful shutdown coordinator (OTel shutdown hooks needed)
Deliverables
1. ADR-0019 — OTel provider configuration
Document:
- SDK version pinned (
go.opentelemetry.io/otelv1.x) - Resource attributes required on every span:
service.name,service.version,deployment.environment - Exporter selection: OTLP gRPC if
OTEL_EXPORTER_OTLP_ENDPOINTset; no-op otherwise - Sampling:
parentbased_traceidratiowith ratio 0.01 for Phase 1; errors always sampled - Propagator: W3C
traceparent+tracestate(standard; compatible with all major backends)
2. packages/telemetry — provider initialization
// Package telemetry initialises OpenTelemetry providers for IBEX services.
// It is the single place where SDK, exporter, and resource are configured.
package telemetry
import (
"context"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/sdk/resource"
semconv "go.opentelemetry.io/otel/semconv/v1.24.0"
)
// Config holds all OTel provider configuration.
// Fields are loaded from environment variables by the service's
// packages/config loader. Defaults are safe for development.
type Config struct {
ServiceName string // OTEL_SERVICE_NAME (required)
ServiceVersion string // OTEL_SERVICE_VERSION (default: "dev")
Environment string // OTEL_DEPLOYMENT_ENVIRONMENT (default: "development")
OTLPEndpoint string // OTEL_EXPORTER_OTLP_ENDPOINT (optional; no-op if empty)
SampleRatio float64 // OTEL_SAMPLE_RATIO (default: 0.01)
}
// Providers holds initialized OTel providers and their shutdown functions.
type Providers struct {
TracerProvider *sdktrace.TracerProvider
MeterProvider *sdkmetric.MeterProvider
Shutdown func(ctx context.Context) error
}
// Init initialises tracer and meter providers from cfg.
// Call Shutdown on service exit (wire into packages/shutdown Coordinator).
// If cfg.OTLPEndpoint is empty, uses no-op exporters suitable for
// development and CI.
func Init(ctx context.Context, cfg Config) (*Providers, error)3. HTTP span middleware
// SpanMiddleware creates a server-side OTel span for every HTTP request.
// The span is named "{method} {route_template}" (e.g. "POST /v1/chat/completions").
// Do NOT use the raw URL path — it contains high-cardinality segments (UUIDs, IDs).
// The route template is extracted from the router's pattern, not the URL.
//
// Span attributes set on every request:
// http.method, http.route, http.status_code, http.request_content_length
//
// The request ID from reqid.FromContext is added as span attribute:
// ibex.request_id
//
// Required position: AFTER RequestIDMiddleware, BEFORE Auth.
// RequestID → Span → Auth → AgentVerification → RateLimit → [handler]
func SpanMiddleware(tracer trace.Tracer) func(http.Handler) http.Handler4. gRPC client trace propagation
Inject W3C traceparent in outgoing gRPC calls from proxy to auth:
// In services/proxy/internal/grpc/interceptors.go,
// alongside RequestIDUnaryInterceptor from M1.2.6:
func OTelUnaryInterceptor() grpc.UnaryClientInterceptor {
return otelgrpc.UnaryClientInterceptor()
// github.com/open-telemetry/opentelemetry-go-contrib/instrumentation/google.golang.org/grpc/otelgrpc
}Environment Variables
| Variable | Default | Description |
|---|---|---|
OTEL_SERVICE_NAME | (required) | Service identifier in traces |
OTEL_SERVICE_VERSION | dev | Binary version tag |
OTEL_DEPLOYMENT_ENVIRONMENT | development | development, staging, production |
OTEL_EXPORTER_OTLP_ENDPOINT | (empty — no-op) | OTLP gRPC endpoint (e.g. localhost:4317) |
OTEL_SAMPLE_RATIO | 0.01 | Fraction of normal requests sampled |
Testing Requirements
TestSpanMiddleware_SpanCreated: usesdktrace/tracetestexporter; assert span name"POST /v1/chat/completions", status code attribute, request_id attribute setTestSpanMiddleware_ErrorSpan: handler returns 500; assert span status isERRORTestTelemetry_NoopOnEmptyEndpoint:OTLPEndpoint=""→ providers initialized, no error, spans are no-opsTestTelemetry_Shutdown:Providers.Shutdown()completes within 5s context timeout
Acceptance Criteria
-
packages/telemetry.Init()wired into both auth and proxymain.go -
Providers.Shutdownregistered withpackages/shutdown.Coordinator - HTTP span middleware creates spans named by route template (not raw URL)
- Span has
ibex.request_id,http.method,http.route,http.status_codeattributes -
service.name,service.version,deployment.environmentin OTel resource - No-op exporter used when
OTEL_EXPORTER_OTLP_ENDPOINTis unset - Tests use in-process span recorder (no external collector required in CI)
- ADR-0019 written and indexed
Last updated on