🗓️ 31102024 1524
📎 #prometheus #observability
prometheus_data_types
How to use correctly
Gauges
- Represents a current measurement
- Can go up or down
- e.g. memory usage
// Use Set() when you know the absolute
// value from some other source.
queueLength.Set(0)
// Use these methods when your code directly observes
// the increase or decrease of something, such as
// adding an item to a queue.
queueLength.Inc() // Increment by 1.
queueLength.Dec() // Decrement by 1.
queueLength.Add(23)
queueLength.Sub(42)
// When you want to know the time of when something happened
myTimestamp.SetToCurrentTime()
Gauge methods
# No labels
queue_length 42
Exposition
# Figure out how long ago an event happened
time() - process_start_time_seconds
PromQL
Sample use cases
- REST API latency
- Database query performance
- SLA
Counters
- Cumulative count over time
- Only allowed to go up
NOTE
Counter resets - Counter resets to 0 upon restart, but this is handled gracefully with Functions
totalRequests.Inc()
Instrumentation methods
Relevant functions
- Usually don't consider the absolute values
- Consider things like What's the rate of increase here, averaged over the preceding time window?
NOTE
Handles counter resets gracefully by treating any decrease as a reset and corrects it as much as possible
Function | Description |
---|---|
rate() | |
irate() | |
increase() |
Summaries
For tracking distributions as a percentile / quantile
requestDurations := prometheus.NewSummary(prometheus.SummaryOpts {
Name: "http_request_duration_seconds",
Help: "A summary of HTTP Request durations in seconds",
Objectives: map[float64]float64{
// 50th percentile with a max absolute error of 0.05
0.5: 0.05,
// 90th percentile with a max absolute error of 0.01
0.9: 0.01,
// 99th percentile with a max absolute error of 0.0001
0.99: 0.001
}
})
requestDurations.Observe(2.3)
Summary metric will output quantiles based on prometheus_summary_streaming
http_request_duration_seconds{quantile="0.5"} 0.052
http_request_duration_seconds{quantile="0.90"} 0.0564
http_request_duration_seconds{quantile="0.99"} 2.372
http_request_duration_seconds_sum 88364.234
http_request_duration_seconds_count 227420
Exposition
Histograms
- Tracking distribution of numeric values
- Counts input values into a set of ranged buckets
- Cumulative by default
TIP
Only the upper bound needs to be defined for cumulative histograms
requestDurations := prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "A histogram of the HTTP request duration in seconds",
Buckets: []float64{0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10}
})
Constructing Histograms
requestDurations.Observe(2.3)
http_request_duration_seconds_bucket{le="0.05"} 4599
http_request_duration_seconds_bucket{le="0.1"} 24128
http_request_duration_seconds_bucket{le="0.25"} 45311
http_request_duration_seconds_bucket{le="0.5"} 59983
http_request_duration_seconds_bucket{le="1"} 60345
http_request_duration_seconds_bucket{le="2.5"} 114003
http_request_duration_seconds_bucket{le="5"} 201325
http_request_duration_seconds_bucket{le="+Inf"} 227420
http_requests_duration_seconds_sum 88364.234
http_requests_duration_seconds_count 227420
Exposition
WARNING
COST vs Resolution
More buckets > Better resolution
Too many buckets > TSDB X_X
Read more at https://prometheus.io/docs/practices/histograms/
Histogram Quantile
For calculating approximate percentiles from a histogram
# IMPORTANT to scope the bucket (5m)
histogram_quantile(
0.9,
rate(http_request_duration_seconds_bucket[5m])
)
# Aggregated histogram quantiles (TBH don't really understand this)
histogram_quantile(
0.9,
sum by(path, method, le) (
rate(http_request_duration_seconds_bucket[5m])
)
)