grafana_agent | jun ren's digital garden

🗓️ 11112025 1630

GRAFANA AGENT

Core Concept:

Lightweight telemetry collector that scrapes metrics/logs/traces locally and forwards to remote backends (Grafana Cloud, Prometheus)
Resource-efficient alternative to running full Prometheus (~30MB vs ~200MB+).

Why It Matters

Hybrid model - Pulls from apps (standard Prometheus), pushes to cloud (remote write)
Cloud-native - Solves "can't scrape behind firewall" problem for managed platforms
Lightweight - No local storage, stateless operation, minimal footprint
Multi-signal - Single agent for metrics, logs, traces, profiles

When to Use

Sending telemetry to managed platforms (Grafana Cloud, remote Prometheus)
Resource-constrained environments (edge, IoT, containers)
Multi-tenant deployments with isolated collection per tenant
Need unified collector instead of separate agents per signal type

When Not to Use

Running self-hosted Prometheus with local storage (use Prometheus directly)
Need local PromQL queries (Agent has no query engine)
Simple single-host monitoring (node_exporter suffices)
Extremely high-cardinality metrics needing local pre-aggregation

Architecture

App exposes /metrics → Agent scrapes (pull) → Agent pushes (remote write) → Cloud Backend

Key insight: Combines pull model (Prometheus standard) with push model (cloud-friendly) to bridge local and remote environments.

Trade-offs

Benefits:

Minimal footprint (30-50MB RAM typical)
No storage management
Works behind firewalls (outbound-only)
Single binary for all telemetry

Drawbacks:

No local querying
Network dependency
Requires remote backend
Additional component to manage

Key Distinctions

vs Prometheus: Agent = Prometheus without storage/querying; use when sending to remote backend

vs OpenTelemetry Collector: Agent is Prometheus-native; OTel is vendor-neutral with broader protocol support

vs Telegraf: Agent for Grafana/Prometheus stack; Telegraf for InfluxDB ecosystem

Deprecation Notice

WARNING

Grafana Agent deprecated (EOL November 1, 2025). Migrate to Grafana Alloy - the successor with improved performance and component-based architecture.

Common Patterns

Sidecar Pattern (sidecar_pattern): Deploy Agent alongside each app instance

Agent scrapes localhost (no network config)
Scales 1:1 with app
Isolated failure domains

Centralized Pattern: Single Agent scrapes multiple apps

Lower resource overhead
Centralized config
Single point of failure

Common Pitfalls

DANGER

WAL disk full: If remote backend unreachable, Write-Ahead Log fills disk. Configure wal_truncate_frequency and monitor disk usage.

WARNING

Cardinality explosion: Agent forwards all metrics. High-cardinality labels (user IDs, request IDs) overwhelm backend. Use metric_relabel_configs to drop before remote write.

WARNING

Scrape interval mismatch: Match or exceed remote backend's expected resolution. Grafana Cloud free tier expects 15s-60s intervals.

Why It Matters​

When to Use​

When Not to Use​

Architecture​

Trade-offs​

Key Distinctions​

Deprecation Notice​

Common Patterns​

Common Pitfalls​

References​

Why It Matters

When to Use

When Not to Use

Architecture

Trade-offs

Key Distinctions

Deprecation Notice

Common Patterns

Common Pitfalls

References