docker_healthcheck | jun ren's digital garden

🗓️ 11112024 1505

DOCKER HEALTHCHECK

What it is:

Periodic command that runs inside container to verify it's functioning correctly
Reports health status: starting, healthy, or unhealthy

Problem it solves:

Containers can be running but broken (crashed process, deadlock, failed dependency)
Without health checks, orchestrators route traffic to broken containers
Users see errors instead of automatic recovery

Health States

Containers transition through three states:

starting: Container initializing, health check hasn't passed yet
healthy: Health check passing consistently
unhealthy: Health check failed after configured retries

When unhealthy, orchestrators like docker_compose or Kubernetes automatically restart or remove from load balancer rotation.

When to Use

Use for:

Production services where automatic recovery matters
Load-balanced APIs
Long-running services that may degrade over time
Microservices with complex dependencies
Database connections that can become stale

Skip for:

Short-lived containers (batch jobs, one-off tasks)
Development environments where manual restart is acceptable
When health check overhead outweighs benefit

Key Distinctions

Running vs Healthy

Container can be running (process active) but unhealthy (not functioning correctly)
Health checks detect the difference

Health Check vs Monitoring

Health checks: Internal container-level checks that trigger restarts
Monitoring: External system-level observability that provides insights

Liveness vs Readiness

Kubernetes context only
Liveness: Is app alive? (restart if not)
Readiness: Can app serve traffic? (remove from load balancer if not)
Docker health checks are simpler - only indicate overall health

Configuration Considerations

Intervals

Balance responsiveness with resource overhead
Typical: 30s for web services, 10s for critical databases

Start Period

Grace period where failed checks don't count toward retry limit
Always set this for slow-starting applications
Without it, containers get marked unhealthy during normal initialization

Retries

How many consecutive failures before marking unhealthy
More retries = more tolerant of transient issues but slower failure detection

Common Pitfalls

DANGER

No start period for slow apps: Container gets marked unhealthy during normal startup. Always set start_period longer than your app's initialization time.

WARNING

Heavy health check operations: Running full test suites or expensive queries wastes resources. Keep checks lightweight - simple ping or status endpoint.

Health States​

When to Use​

Key Distinctions​

Running vs Healthy​

Health Check vs Monitoring​

Liveness vs Readiness​

Configuration Considerations​

Intervals​

Start Period​

Retries​

Common Pitfalls​

References​