๐Ÿ—“๏ธ 16032026 2200

MICROMETER PROMQL CHEATSHEET

Practical cheatsheet mapping Micrometer Counter/Timer usage (via OpStats or direct API) to the PromQL queries you write in Grafana. For theory, see micrometer_metric_types and micrometer_to_prometheus_mapping.

Naming Translationโ€‹

Micrometer metric name    โ†’  Prometheus metric name
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
my.business.event โ†’ my_business_event_total (counter)
my.operation.duration โ†’ my_operation_duration_seconds_* (timer)

Dots become underscores. Counters get _total. Timers get _seconds + sub-metrics (_count, _sum, _bucket, _max).

Tags become labels: Tag.of("subjectId", "123") โ†’ {subjectId="123"}.


Counter (OpStats.count)โ€‹

Javaโ€‹

// Instance method โ€” auto-includes subjectId tag
opStats.count("order.created", Tag.of("type", "limit"));

// Static method โ€” full control over tags
OpStats.count("order.created", 1.0, Tags.of("subjectId", "123", "type", "limit"));

Prometheus producesโ€‹

order_created_total{subjectId="123", type="limit"} 42

PromQL Queriesโ€‹

GoalQueryWhat it tells you
Rate (events/sec)rate(order_created_total[5m])How fast events are happening right now
Total count in windowincrease(order_created_total[1h])How many events happened in the last hour
Rate by tagsum by(type) (rate(order_created_total[5m]))Breakdown of event rate per tag value
Error ratiorate(order_failed_total[5m]) / rate(order_created_total[5m])Fraction of events that are failures
Top N by labeltopk(5, sum by(subjectId) (rate(order_created_total[5m])))Which subjects generate the most events
Compare to thresholdrate(order_created_total[5m]) > 100Alert when rate exceeds 100/sec
Total across all instancessum(rate(order_created_total[5m]))Cluster-wide event rate
tip

Always wrap counters in rate() or increase(). Raw counter values are monotonically increasing and not useful on their own.


Timer (OpStats.time)โ€‹

Javaโ€‹

// Time a Runnable
opStats.time("order.process.duration", () -> processOrder(order));

// Time a Supplier (returns result)
OrderResult result = opStats.time("order.process.duration", () -> processOrder(order));

// Record a pre-measured Duration
opStats.time("order.process.duration", Duration.ofMillis(elapsed));

// Static versions
OpStats.time("order.process.duration", () -> processOrder(order), tags);
OpStats.time("order.process.duration", Duration.ofMillis(elapsed), tags);

Prometheus producesโ€‹

order_process_duration_seconds_count{subjectId="123"}  500      # total calls
order_process_duration_seconds_sum{subjectId="123"} 123.45 # total time (seconds)
order_process_duration_seconds_max{subjectId="123"} 2.3 # max observed
order_process_duration_seconds_bucket{subjectId="123", le="0.01"} 45 # calls โ‰ค 10ms
order_process_duration_seconds_bucket{subjectId="123", le="0.1"} 234 # calls โ‰ค 100ms
order_process_duration_seconds_bucket{subjectId="123", le="+Inf"} 500 # all calls

PromQL Queriesโ€‹

GoalQueryWhat it tells you
Average latencyrate(order_process_duration_seconds_sum[5m]) / rate(order_process_duration_seconds_count[5m])Mean duration per call
P50 latencyhistogram_quantile(0.50, rate(order_process_duration_seconds_bucket[5m]))Median โ€” half of requests are faster than this
P95 latencyhistogram_quantile(0.95, rate(order_process_duration_seconds_bucket[5m]))95% of requests are faster than this
P99 latencyhistogram_quantile(0.99, rate(order_process_duration_seconds_bucket[5m]))Tail latency โ€” worst 1% experience
Request raterate(order_process_duration_seconds_count[5m])Throughput (calls/sec) โ€” same as a counter
Max observedorder_process_duration_seconds_maxPeak latency in current scrape window
Latency by taghistogram_quantile(0.95, sum by(subjectId, le) (rate(order_process_duration_seconds_bucket[5m])))P95 broken down per subject
% requests under SLAsum(rate(order_process_duration_seconds_bucket{le="0.2"}[5m])) / sum(rate(order_process_duration_seconds_count[5m])) * 100What % of calls complete within 200ms
Slow call alerthistogram_quantile(0.95, rate(order_process_duration_seconds_bucket[5m])) > 1Fire alert when P95 exceeds 1 second
Total time spentincrease(order_process_duration_seconds_sum[1h])Total seconds spent in this operation over 1 hour
tip

histogram_quantile needs the le label preserved โ€” when aggregating, always keep le in the by() clause: sum by(someTag, le) (rate(..._bucket[5m])).


Common Patternsโ€‹

Rate + filter by tagโ€‹

rate(order_created_total{subjectId="123", type="limit"}[5m])

Regex match on labelsโ€‹

rate(order_created_total{type=~"limit|market"}[5m])

Exclude labelsโ€‹

rate(order_created_total{type!="test"}[5m])

Aggregate across instancesโ€‹

sum(rate(order_created_total[5m]))                         # total rate
sum by(type) (rate(order_created_total[5m])) # rate per type
avg by(instance) (rate(order_process_duration_seconds_sum[5m]) / rate(order_process_duration_seconds_count[5m])) # avg latency per instance

Time window selectionโ€‹

WindowUse case
[1m]Real-time, noisy
[5m]Standard dashboards (good balance)
[15m]Smoother trends
[1h]Long-term overview
warning

Shorter windows = noisier graphs but faster reaction. Longer windows = smoother but hide spikes. Match the window to your scrape interval โ€” at minimum 4x the scrape interval (e.g., 15s scrape โ†’ use [1m] minimum).


Quick Referenceโ€‹

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ I want to knowโ€ฆ โ”‚ PromQL โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Event rate โ”‚ rate(metric_total[5m]) โ”‚
โ”‚ Event count โ”‚ increase(metric_total[1h]) โ”‚
โ”‚ Avg latency โ”‚ rate(metric_sum[5m]) / rate(metric_count[5m]) โ”‚
โ”‚ P95 latency โ”‚ histogram_quantile(0.95, rate(metric_bucket[5m])) โ”‚
โ”‚ P99 latency โ”‚ histogram_quantile(0.99, rate(metric_bucket[5m])) โ”‚
โ”‚ Throughput โ”‚ rate(metric_count[5m]) โ”‚
โ”‚ % under SLA โ”‚ rate(metric_bucket{le="X"}[5m]) / rate(metric_countโ€ฆ) โ”‚
โ”‚ Error ratio โ”‚ rate(errors_total[5m]) / rate(requests_total[5m]) โ”‚
โ”‚ Top N โ”‚ topk(N, sum by(label) (rate(metric_total[5m]))) โ”‚
โ”‚ Total time spent โ”‚ increase(metric_sum[1h]) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Referencesโ€‹