phase 2 single provider

Milestones 2.1.2 and 2.1.3 implement OpenAI HTTP calls, but provider failures must surface to clients as the stable IBEX error envelope defined in Phase 1 (M1.2.3), not raw OpenAI JSON or opaque 502s. Without a single mapping layer: - OpenAI 429 responses may omit `Retry-After` in the IBEX envelope.

Milestone 2.1.5 — Provider Error Mapping and Stable Envelope

Status: Planned
Goal: 2.1 — LLM provider abstraction and OpenAI forwarding
Phase: 2 — Single Provider End-to-End
Estimated effort: 1–2 days
ADR required: None (extends ADR-0013 envelope and ADR-0023 client policy)


Why This Milestone Exists

Milestones 2.1.2 and 2.1.3 implement OpenAI HTTP calls, but provider failures must surface to clients as the stable IBEX error envelope defined in Phase 1 (M1.2.3), not raw OpenAI JSON or opaque 502s. Without a single mapping layer:

  • OpenAI 429 responses may omit Retry-After in the IBEX envelope.
  • OpenAI 401 (invalid API key) could leak as 401 to the caller, implying the client's PAT is wrong.
  • Transient 503s from OpenAI may be retried by the client when the proxy already exhausted retries.

This milestone centralises provider → IBEX error translation in one package used by both streaming and non-streaming paths.


Non-Goals

  • Mapping errors from non-OpenAI providers (Phase 4)
  • Custom per-org error messages (operator UI — Phase 3)
  • Retrying after the response has started streaming to the client

Branch

feature/m2-1-5-provider-error-mapping

PR Title

feat(proxy): centralise provider error mapping to stable envelope (m2.1.5)


Prerequisites

  • 2.1.2 merged (non-streaming client exists)
  • 2.1.3 merged (streaming path exists)
  • 1.2.3 merged (error envelope)

Deliverables

1. MapProviderError function

Go
// MapProviderError translates a provider HTTP response or transport error into
// the canonical IBEX apierror sent to clients. Never exposes provider API keys
// or raw provider response bodies in the returned error.
//
// providerName is included in logs only (structured field), never in client JSON.
func MapProviderError(
    providerName string,
    statusCode int,
    retryAfter time.Duration,
    transportErr error,
) *apierror.Error

2. Status code mapping table (canonical)

Provider status / conditionIBEX codeHTTP statusNotes
400INVALID_REQUEST400Pass through safe provider message subset if parseable
401PROVIDER_UNAVAILABLE503Invalid provider key — not client PAT
429RATE_LIMITED429Forward Retry-After when present
500, 502, 503, 504PROVIDER_UNAVAILABLE503After client retries exhausted
Client timeoutPROVIDER_TIMEOUT504
Network / TLS errorPROVIDER_UNAVAILABLE503

3. Streaming behaviour

  • If provider returns non-2xx before first byte to client: map via MapProviderError and return JSON envelope (no SSE).
  • If error occurs after streaming started: terminate SSE with documented data: [DONE] pattern and optional terminal error event per ADR-0024.

4. Wire into handlers

  • Non-streaming handler uses MapProviderError exclusively — no inline status switches.
  • Streaming forwarder delegates to the same mapper for pre-stream failures.

Testing Requirements

  • TestMapProviderError_OpenAI429: 429 + Retry-After: 30RATE_LIMITED + header
  • TestMapProviderError_OpenAI401: 401 → PROVIDER_UNAVAILABLE / 503 (not 401)
  • TestMapProviderError_Timeout: context deadline → PROVIDER_TIMEOUT / 504
  • TestStreaming_PreStreamError: provider 503 before bytes → JSON envelope, not SSE
  • TestStreaming_MidStreamError: provider disconnect mid-stream → graceful SSE termination

Acceptance Criteria

  • All provider errors from OpenAI client paths use MapProviderError
  • No provider API key or response content in logs or client payloads
  • OpenAI 429 returns IBEX RATE_LIMITED with Retry-After when header present
  • OpenAI 401 returns 503 PROVIDER_UNAVAILABLE (client PAT semantics unchanged)
  • Unit tests cover full mapping table

Risks

RiskLikelihoodMitigation
OpenAI error JSON shape changesMediumMap by HTTP status first; parse message only when schema matches
Double-retry storms (proxy + client)MediumDocument in API docs that proxy retries are exhaustive for 5xx
Mid-stream errors confuse SDK clientsLowFollow OpenAI-compatible SSE error event pattern in ADR-0024
Edit on GitHub

Last updated on

On this page

0%