Skip to main content

🗓️ 07032025 1549
📎

lambda_architecture

batch + streaming hybrid, considered a subset of streaming architecture

Lambda Architecture is designed to handle both batch and real-time data by using two parallel processing paths:

Batch Layer → For large-scale, historical data processing.
Speed (Real-time) Layer → For real-time, low-latency queries.
Serving Layer → Combines results from both layers to provide a unified output.

How It Works

Batch Layer (Historical Processing)

Stores all incoming data.
Runs batch jobs periodically (e.g., using Hadoop, Apache Spark, MaxCompute).
- Generates precomputed views for fast querying.

Speed Layer (Real-Time Processing)

Processes streaming data in real time (e.g., Apache Kafka, Apache Flink, Spark Streaming).
Stores recent results for fast access.

Serving Layer (Query Engine)

Merges outputs from both the batch and speed layers.
Provides a unified view for applications and dashboards.

Pros	Cons
✅ Combines batch + streaming for accurate & real-time insights	❌ Complex – Maintaining two separate processing paths is harder
✅ Scalable & fault-tolerant	❌ Data Duplication – Requires logic to merge batch & real-time results
✅ Works well for applications needing historical & real-time analytics	❌ Latency in batch layer – Real-time results may differ slightly from batch

When to Use Lambda Architecture?

When you need both real-time and historical processing.
When batch processing is critical (e.g., regulatory reporting, financial transactions).
When you want fault tolerance and the ability to recompute historical data.

References

How It Works
When to Use Lambda Architecture?