Skip to main content

🗓️ 07032025 1549
📎

lambda_architecture

batch + streaming hybrid, considered a subset of streaming architecture

Lambda Architecture is designed to handle both batch and real-time data by using two parallel processing paths:

  • Batch Layer → For large-scale, historical data processing.
  • Speed (Real-time) Layer → For real-time, low-latency queries.
  • Serving Layer → Combines results from both layers to provide a unified output.

How It Works

Batch Layer (Historical Processing)

  • Stores all incoming data.
  • Runs batch jobs periodically (e.g., using Hadoop, Apache Spark, MaxCompute).
    • Generates precomputed views for fast querying.

Speed Layer (Real-Time Processing)

  • Processes streaming data in real time (e.g., Apache Kafka, Apache Flink, Spark Streaming).
  • Stores recent results for fast access.

Serving Layer (Query Engine)

  • Merges outputs from both the batch and speed layers.
  • Provides a unified view for applications and dashboards.
ProsCons
Combines batch + streaming for accurate & real-time insightsComplex – Maintaining two separate processing paths is harder
Scalable & fault-tolerantData Duplication – Requires logic to merge batch & real-time results
✅ Works well for applications needing historical & real-time analytics❌ Latency in batch layer – Real-time results may differ slightly from batch

When to Use Lambda Architecture?

  • When you need both real-time and historical processing.
  • When batch processing is critical (e.g., regulatory reporting, financial transactions).
  • When you want fault tolerance and the ability to recompute historical data.

References