🗓️ 31102024 1524
Core Concept: Prometheus has four metric types (Counter, Gauge, Histogram, Summary) - each suited for different measurement patterns and queried with specific functions.
Why It Matters
Choosing the wrong metric type leads to incorrect queries and misleading visualizations. Understanding prometheus_time_series_basics helps you see how these types are stored as time series.
Counters
What: Cumulative count that only increases (resets to 0 on restart)
When to Use: Count events - requests, errors, bytes sent, tasks completed
Go Example:
totalRequests.Inc()
Exposition:
http_requests_total{status="200"} 1543
http_requests_total{status="500"} 12
Query with: rate(), increase(), irate() - see prometheus_range_function_calculations for formulas
Never use raw counter values in alerts/dashboards - always use rate() or increase() because counters reset on restart.
Gauges
What: Current measurement that can go up or down
When to Use: Measure current state - memory usage, queue length, temperature, active connections
Go Example:
queueLength.Set(42) // Set absolute value
queueLength.Inc() // Increment by 1
queueLength.Dec() // Decrement by 1
queueLength.Add(23) // Add amount
queueLength.Sub(10) // Subtract amount
Exposition:
queue_length 42
memory_usage_bytes 1073741824
Query with: avg_over_time(), max_over_time(), delta(), deriv() - see prometheus_range_function_calculations
Common Pattern:
# How long ago did something happen?
time() - process_start_time_seconds
Summaries
What: Pre-calculated percentiles computed client-side using prometheus_summary_streaming
When to Use: Single instance metrics where you know required percentiles upfront
Go Example:
requestDurations := prometheus.NewSummary(prometheus.SummaryOpts{
Name: "http_request_duration_seconds",
Help: "A summary of HTTP Request durations in seconds",
Objectives: map[float64]float64{
0.5: 0.05, // p50 with ±0.05 error
0.9: 0.01, // p90 with ±0.01 error
0.99: 0.001, // p99 with ±0.001 error
}
})
requestDurations.Observe(2.3)
Exposition:
http_request_duration_seconds{quantile="0.5"} 0.052
http_request_duration_seconds{quantile="0.90"} 0.0564
http_request_duration_seconds{quantile="0.99"} 2.372
http_request_duration_seconds_sum 88364.234
http_request_duration_seconds_count 227420
Trade-offs:
- ✅ Accurate quantiles pre-calculated
- ✅ Low query cost (values already computed)
- ❌ Cannot aggregate across instances (each calculates independently)
- ❌ Cannot change quantiles after deployment
Histograms
- Tracking distribution of numeric values
- Counts input values into a set of ranged buckets
- Cumulative by default
Only the upper bound needs to be defined for cumulative histograms
requestDurations := prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "A histogram of the HTTP request duration in seconds",
Buckets: []float64{0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10}
})
Constructing Histograms
requestDurations.Observe(2.3)
http_request_duration_seconds_bucket{le="0.05"} 4599
http_request_duration_seconds_bucket{le="0.1"} 24128
http_request_duration_seconds_bucket{le="0.25"} 45311
http_request_duration_seconds_bucket{le="0.5"} 59983
http_request_duration_seconds_bucket{le="1"} 60345
http_request_duration_seconds_bucket{le="2.5"} 114003
http_request_duration_seconds_bucket{le="5"} 201325
http_request_duration_seconds_bucket{le="+Inf"} 227420
http_requests_duration_seconds_sum 88364.234
http_requests_duration_seconds_count 227420
Exposition
COST vs Resolution
More buckets > Better resolution
Too many buckets > TSDB X_X
Read more at https://prometheus.io/docs/practices/histograms/
Histogram Quantile
For calculating approximate percentiles from a histogram
# IMPORTANT to scope the bucket (5m)
histogram_quantile(
0.9,
rate(http_request_duration_seconds_bucket[5m])
)
# Aggregated histogram quantiles (TBH don't really understand this)
histogram_quantile(
0.9,
sum by(path, method, le) (
rate(http_request_duration_seconds_bucket[5m])
)
)