🗓️ 11112024 1505

DOCKER HEALTHCHECK

What it is:

  • Periodic command that runs inside container to verify it's functioning correctly
  • Reports health status: starting, healthy, or unhealthy

Problem it solves:

  • Containers can be running but broken (crashed process, deadlock, failed dependency)
  • Without health checks, orchestrators route traffic to broken containers
  • Users see errors instead of automatic recovery

Health States

Containers transition through three states:

  • starting: Container initializing, health check hasn't passed yet
  • healthy: Health check passing consistently
  • unhealthy: Health check failed after configured retries

When unhealthy, orchestrators like docker_compose or Kubernetes automatically restart or remove from load balancer rotation.

When to Use

Use for:

  • Production services where automatic recovery matters
  • Load-balanced APIs
  • Long-running services that may degrade over time
  • Microservices with complex dependencies
  • Database connections that can become stale

Skip for:

  • Short-lived containers (batch jobs, one-off tasks)
  • Development environments where manual restart is acceptable
  • When health check overhead outweighs benefit

Key Distinctions

Running vs Healthy

  • Container can be running (process active) but unhealthy (not functioning correctly)
  • Health checks detect the difference

Health Check vs Monitoring

  • Health checks: Internal container-level checks that trigger restarts
  • Monitoring: External system-level observability that provides insights

Liveness vs Readiness

  • Kubernetes context only
  • Liveness: Is app alive? (restart if not)
  • Readiness: Can app serve traffic? (remove from load balancer if not)
  • Docker health checks are simpler - only indicate overall health

Configuration Considerations

Intervals

  • Balance responsiveness with resource overhead
  • Typical: 30s for web services, 10s for critical databases

Start Period

  • Grace period where failed checks don't count toward retry limit
  • Always set this for slow-starting applications
  • Without it, containers get marked unhealthy during normal initialization

Retries

  • How many consecutive failures before marking unhealthy
  • More retries = more tolerant of transient issues but slower failure detection

Common Pitfalls

DANGER

No start period for slow apps: Container gets marked unhealthy during normal startup. Always set start_period longer than your app's initialization time.

WARNING

Heavy health check operations: Running full test suites or expensive queries wastes resources. Keep checks lightweight - simple ping or status endpoint.


References