phase 2 single provider

ClickHouse writes must be batched. Individual inserts have high per-row overhead in ClickHouse (each insert is a part merge). The correct pattern is to buffer rows in memory and flush them as a batch insert (using ClickHouse native protocol or HTTP interface) every N rows or every T seconds, whichever comes first. This

Milestone 2.5.2 — ClickHouse Client Package (packages/clickhouse)

Status: Planned
Goal: 2.5 — Async trace emission to ClickHouse
Phase: 2 — Single Provider End-to-End
Estimated effort: 2 days


Why This Milestone Exists

ClickHouse writes must be batched. Individual inserts have high per-row overhead in ClickHouse (each insert is a part merge). The correct pattern is to buffer rows in memory and flush them as a batch insert (using ClickHouse native protocol or HTTP interface) every N rows or every T seconds, whichever comes first. This milestone provides the typed, tested batch writer that milestone 2.5.3 uses as its flush mechanism.


Branch

chore/m2-5-2-clickhouse-client

PR Title

chore(infra): packages/clickhouse typed batch writer (m2.5.2)


Deliverables

Go
// Package clickhouse provides a typed batch writer for ibex.llm_traces.
package clickhouse
 
// TraceRecord is a single row in ibex.llm_traces.
// Field names and types match the ClickHouse schema exactly.
type TraceRecord struct {
    RequestID          string
    OrgID              uuid.UUID
    AgentID            uuid.UUID
    SessionID          *uuid.UUID
    CheckpointID       *uuid.UUID
    Model              string
    Provider           string
    IsStreaming        bool
    InputTokens        uint32
    OutputTokens       uint32
    TotalTokens        uint32
    AuthLatencyMs      uint16
    DirectiveLatencyMs uint16
    ProviderTTFBMs     uint32
    TotalLatencyMs     uint32
    StatusCode         uint16
    IsComplete         bool
    ErrorCode          string
    RequestedAt        time.Time
    CompletedAt        time.Time
}
 
// Writer batches TraceRecord inserts and flushes to ClickHouse.
// Flushes when batch reaches MaxBatchSize rows OR FlushInterval elapses.
// Safe for concurrent use; multiple goroutines can call Write concurrently.
type Writer struct {
    conn          driver.Conn  // clickhouse-go/v2
    maxBatchSize  int
    flushInterval time.Duration
    // ...
}
 
// Write enqueues a TraceRecord for the next batch flush.
// Returns immediately; does not wait for the flush.
// Returns ErrWriterClosed if the Writer has been shut down.
func (w *Writer) Write(record TraceRecord) error
 
// Flush forces an immediate batch insert of all queued records.
// Called by the shutdown coordinator on SIGTERM to drain the queue.
func (w *Writer) Flush(ctx context.Context) error
 
// Close stops the background flush goroutine and flushes remaining records.
// Implements io.Closer for use with packages/shutdown.
func (w *Writer) Close() error

Batch flush strategy:

  • Default batch size: 100 rows
  • Default flush interval: 5 seconds
  • On shutdown: flush remaining rows with 10-second deadline
  • On flush error: log and discard (trace loss is preferable to blocking)

Acceptance Criteria

  • Write is non-blocking; returns before the insert completes
  • Batch flush occurs when size reaches MaxBatchSize OR FlushInterval elapses
  • Close flushes remaining rows before returning
  • Flush error does NOT propagate to callers of Write (best-effort delivery)
  • Writer registered with packages/shutdown.Coordinator in proxy main.go

Edit on GitHub

Last updated on

On this page

0%