๐๏ธ 16032026 2200
Practical cheatsheet mapping Micrometer Counter/Timer usage (via OpStats or direct API) to the PromQL queries you write in Grafana. For theory, see micrometer_metric_types and micrometer_to_prometheus_mapping.
Naming Translationโ
Micrometer metric name โ Prometheus metric name
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
my.business.event โ my_business_event_total (counter)
my.operation.duration โ my_operation_duration_seconds_* (timer)
Dots become underscores. Counters get _total. Timers get _seconds + sub-metrics (_count, _sum, _bucket, _max).
Tags become labels: Tag.of("subjectId", "123") โ {subjectId="123"}.
Counter (OpStats.count)โ
Javaโ
// Instance method โ auto-includes subjectId tag
opStats.count("order.created", Tag.of("type", "limit"));
// Static method โ full control over tags
OpStats.count("order.created", 1.0, Tags.of("subjectId", "123", "type", "limit"));
Prometheus producesโ
order_created_total{subjectId="123", type="limit"} 42
PromQL Queriesโ
| Goal | Query | What it tells you |
|---|---|---|
| Rate (events/sec) | rate(order_created_total[5m]) | How fast events are happening right now |
| Total count in window | increase(order_created_total[1h]) | How many events happened in the last hour |
| Rate by tag | sum by(type) (rate(order_created_total[5m])) | Breakdown of event rate per tag value |
| Error ratio | rate(order_failed_total[5m]) / rate(order_created_total[5m]) | Fraction of events that are failures |
| Top N by label | topk(5, sum by(subjectId) (rate(order_created_total[5m]))) | Which subjects generate the most events |
| Compare to threshold | rate(order_created_total[5m]) > 100 | Alert when rate exceeds 100/sec |
| Total across all instances | sum(rate(order_created_total[5m])) | Cluster-wide event rate |
Always wrap counters in rate() or increase(). Raw counter values are monotonically increasing and not useful on their own.
Timer (OpStats.time)โ
Javaโ
// Time a Runnable
opStats.time("order.process.duration", () -> processOrder(order));
// Time a Supplier (returns result)
OrderResult result = opStats.time("order.process.duration", () -> processOrder(order));
// Record a pre-measured Duration
opStats.time("order.process.duration", Duration.ofMillis(elapsed));
// Static versions
OpStats.time("order.process.duration", () -> processOrder(order), tags);
OpStats.time("order.process.duration", Duration.ofMillis(elapsed), tags);
Prometheus producesโ
order_process_duration_seconds_count{subjectId="123"} 500 # total calls
order_process_duration_seconds_sum{subjectId="123"} 123.45 # total time (seconds)
order_process_duration_seconds_max{subjectId="123"} 2.3 # max observed
order_process_duration_seconds_bucket{subjectId="123", le="0.01"} 45 # calls โค 10ms
order_process_duration_seconds_bucket{subjectId="123", le="0.1"} 234 # calls โค 100ms
order_process_duration_seconds_bucket{subjectId="123", le="+Inf"} 500 # all calls
PromQL Queriesโ
| Goal | Query | What it tells you |
|---|---|---|
| Average latency | rate(order_process_duration_seconds_sum[5m]) / rate(order_process_duration_seconds_count[5m]) | Mean duration per call |
| P50 latency | histogram_quantile(0.50, rate(order_process_duration_seconds_bucket[5m])) | Median โ half of requests are faster than this |
| P95 latency | histogram_quantile(0.95, rate(order_process_duration_seconds_bucket[5m])) | 95% of requests are faster than this |
| P99 latency | histogram_quantile(0.99, rate(order_process_duration_seconds_bucket[5m])) | Tail latency โ worst 1% experience |
| Request rate | rate(order_process_duration_seconds_count[5m]) | Throughput (calls/sec) โ same as a counter |
| Max observed | order_process_duration_seconds_max | Peak latency in current scrape window |
| Latency by tag | histogram_quantile(0.95, sum by(subjectId, le) (rate(order_process_duration_seconds_bucket[5m]))) | P95 broken down per subject |
| % requests under SLA | sum(rate(order_process_duration_seconds_bucket{le="0.2"}[5m])) / sum(rate(order_process_duration_seconds_count[5m])) * 100 | What % of calls complete within 200ms |
| Slow call alert | histogram_quantile(0.95, rate(order_process_duration_seconds_bucket[5m])) > 1 | Fire alert when P95 exceeds 1 second |
| Total time spent | increase(order_process_duration_seconds_sum[1h]) | Total seconds spent in this operation over 1 hour |
histogram_quantile needs the le label preserved โ when aggregating, always keep le in the by() clause: sum by(someTag, le) (rate(..._bucket[5m])).
Common Patternsโ
Rate + filter by tagโ
rate(order_created_total{subjectId="123", type="limit"}[5m])
Regex match on labelsโ
rate(order_created_total{type=~"limit|market"}[5m])
Exclude labelsโ
rate(order_created_total{type!="test"}[5m])
Aggregate across instancesโ
sum(rate(order_created_total[5m])) # total rate
sum by(type) (rate(order_created_total[5m])) # rate per type
avg by(instance) (rate(order_process_duration_seconds_sum[5m]) / rate(order_process_duration_seconds_count[5m])) # avg latency per instance
Time window selectionโ
| Window | Use case |
|---|---|
[1m] | Real-time, noisy |
[5m] | Standard dashboards (good balance) |
[15m] | Smoother trends |
[1h] | Long-term overview |
Shorter windows = noisier graphs but faster reaction. Longer windows = smoother but hide spikes. Match the window to your scrape interval โ at minimum 4x the scrape interval (e.g., 15s scrape โ use [1m] minimum).
Quick Referenceโ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ I want to knowโฆ โ PromQL โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Event rate โ rate(metric_total[5m]) โ
โ Event count โ increase(metric_total[1h]) โ
โ Avg latency โ rate(metric_sum[5m]) / rate(metric_count[5m]) โ
โ P95 latency โ histogram_quantile(0.95, rate(metric_bucket[5m])) โ
โ P99 latency โ histogram_quantile(0.99, rate(metric_bucket[5m])) โ
โ Throughput โ rate(metric_count[5m]) โ
โ % under SLA โ rate(metric_bucket{le="X"}[5m]) / rate(metric_countโฆ) โ
โ Error ratio โ rate(errors_total[5m]) / rate(requests_total[5m]) โ
โ Top N โ topk(N, sum by(label) (rate(metric_total[5m]))) โ
โ Total time spent โ increase(metric_sum[1h]) โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ