writing go services that don't lie about latency

Most services lie about their latency. Not intentionally — they just measure the wrong thing.

The classic mistake: you log the time your handler takes, ship it to a dashboard, and call it p99. But that number excludes queue wait time, connection setup, middleware overhead, and the TLS handshake your client is sitting through. The number looks great. The user experience does not.

what to actually measure

Measure from the moment a byte arrives on the wire to the moment the last byte leaves. In Go, that means wrapping your entire http.Handler, not individual business logic:

func latencyMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        start := time.Now()
        rw := &responseWriter{ResponseWriter: w}
        next.ServeHTTP(rw, r)
        duration := time.Since(start)
        recordLatency(r.URL.Path, rw.status, duration)
    })
}

histograms, not averages

p50 is nearly useless for latency. Log histograms. Use exponential buckets so the high end has resolution where it matters.

If your p99 is 3× your p50, you have a problem. If your p999 is 10× your p99, you have a different, harder problem — usually GC pauses, lock contention, or a slow downstream with no timeout.

timeouts are not optional

Every outbound call needs a deadline. The Go idiom:

ctx, cancel := context.WithTimeout(ctx, 200*time.Millisecond)
defer cancel()
resp, err := client.Do(req.WithContext(ctx))

Without this, one slow upstream turns your entire service into a slow upstream for the next layer.