Skip to main content

🗓️ 31102024 1524
📎 #prometheus #observability

prometheus_data_types

How to use correctly

Gauges

  • Represents a current measurement
  • Can go up or down
  • e.g. memory usage
// Use Set() when you know the absolute
// value from some other source.
queueLength.Set(0)

// Use these methods when your code directly observes
// the increase or decrease of something, such as
// adding an item to a queue.

queueLength.Inc() // Increment by 1.
queueLength.Dec() // Decrement by 1.
queueLength.Add(23)
queueLength.Sub(42)

// When you want to know the time of when something happened
myTimestamp.SetToCurrentTime()

Gauge methods

# No labels
queue_length 42

Exposition

# Figure out how long ago an event happened
time() - process_start_time_seconds

PromQL

Sample use cases

  1. REST API latency
  2. Database query performance
  3. SLA

Counters

  • Cumulative count over time
  • Only allowed to go up
NOTE

Counter resets - Counter resets to 0 upon restart, but this is handled gracefully with Functions

totalRequests.Inc()

Instrumentation methods

Relevant functions

  • Usually don't consider the absolute values
  • Consider things like What's the rate of increase here, averaged over the preceding time window?
NOTE

Handles counter resets gracefully by treating any decrease as a reset and corrects it as much as possible

FunctionDescription
rate()
irate()
increase()

Summaries

For tracking distributions as a percentile / quantile

requestDurations := prometheus.NewSummary(prometheus.SummaryOpts {
Name: "http_request_duration_seconds",
Help: "A summary of HTTP Request durations in seconds",
Objectives: map[float64]float64{
// 50th percentile with a max absolute error of 0.05
0.5: 0.05,
// 90th percentile with a max absolute error of 0.01
0.9: 0.01,
// 99th percentile with a max absolute error of 0.0001
0.99: 0.001
}
})


requestDurations.Observe(2.3)

Summary metric will output quantiles based on prometheus_summary_streaming

http_request_duration_seconds{quantile="0.5"} 0.052
http_request_duration_seconds{quantile="0.90"} 0.0564
http_request_duration_seconds{quantile="0.99"} 2.372
http_request_duration_seconds_sum 88364.234
http_request_duration_seconds_count 227420

Exposition

Histograms

  • Tracking distribution of numeric values
  • Counts input values into a set of ranged buckets
  • Cumulative by default
TIP

Only the upper bound needs to be defined for cumulative histograms

requestDurations := prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "A histogram of the HTTP request duration in seconds",
Buckets: []float64{0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10}
})

Constructing Histograms

requestDurations.Observe(2.3)
http_request_duration_seconds_bucket{le="0.05"} 4599
http_request_duration_seconds_bucket{le="0.1"} 24128
http_request_duration_seconds_bucket{le="0.25"} 45311
http_request_duration_seconds_bucket{le="0.5"} 59983
http_request_duration_seconds_bucket{le="1"} 60345
http_request_duration_seconds_bucket{le="2.5"} 114003
http_request_duration_seconds_bucket{le="5"} 201325
http_request_duration_seconds_bucket{le="+Inf"} 227420
http_requests_duration_seconds_sum 88364.234
http_requests_duration_seconds_count 227420

Exposition

WARNING

COST vs Resolution

More buckets > Better resolution

Too many buckets > TSDB X_X

Read more at https://prometheus.io/docs/practices/histograms/

Histogram Quantile

For calculating approximate percentiles from a histogram

# IMPORTANT to scope the bucket (5m)
histogram_quantile(
0.9,
rate(http_request_duration_seconds_bucket[5m])
)

# Aggregated histogram quantiles (TBH don't really understand this)
histogram_quantile(
0.9,
sum by(path, method, le) (
rate(http_request_duration_seconds_bucket[5m])
)
)

References