🗓️ 31102024 1149
📎 #observability #prometheus

prometheus

SUMMARY

Metrics based system that helps you collect / record metrics from your services

System Architecture

Pull based monitoring system

Pulls data from a list of services
- List can be configured
  - Statically
  - Dynamically through service discovery
- Can use to generate alerts through an Alertmanager
- Or query

Core Features

Data model

Tracks and stores time series values (Numeric values that change over time)

Sampled at specific points in time (prometheus pull interval)

Term	Description
sample
series	a progression of samples
series identifier	metric name and a set of labels
metric name	what you are trying to measure
label	key-value pairs that allow you to partition a metric name into individual time series
target labels

Transfer format

# HELP http_requests_total The total number of prrocessed HTTP requests.
# TYPE http_requests_total counter
http_requests_total{status="200"} 8556
http_requests_total{status="404"} 20
http_requests_total{status="500"} 68

# HELP process_open_fds Number of open file descripttors
# TYPE process_open_fds gauge
process_open_fds 32

INFO

It is text based so that no special libraries are needed to expose metrics

NOTE

The above is scraped through a HTTP endpoint

Query language

For doing useful things with data stored in TSDB

Prom QL

has many functions to do stuff with metrics (dimension based aggregations tec.)
- an be
can be very mathy

Alerting

Also based on PromQL

alert: Many500Errors
expr: |
(
	sum by(path) (rate(http_requests_total{status="500"}[5m]))
/
	sum by(path) (rate(http_requests_total[5m]))
) * 100 > 5
for: 5m
labels:
	severity: "critical"
annotations:
	summary: "Many 500 errors for path {{$labels.path}} ({{$value}}%)"

Service Discovery

Prometheus easily integrates with many service discovery platforms

System Architecture​

Core Features​

Data model​

Transfer format​

Query language​

Alerting​

Service Discovery​

References