Skip to main content

🗓️ 16012025 1848
📎

flink_watermarks

Background

  • Out-of-Order Events:
    • Streams often contain events that arrive in an order different from when they occurred
    • Some buffering and delay are needed to ensure correct processing order
  • Progress Without Infinite Waiting:
    • To avoid waiting indefinitely for earlier events, watermarks provide a mechanism to decide when to move forward with processing.

Purpose

Define when to stop waiting for earlier events and proceed with processing

Requirements

What is needed to process data based on event time(flink_notions_of_time)

  • Timestamp Extractor
  • Watermark Generator - required to handle events that may arrive out of order.

Mechanism

  • Inserted into streams to indicate that all earlier events (up to time t) have likely been processed.
  • Common strategy: Bounded-Out-of-Orderness - assumes a fixed maximum delay for late events
  • Also possible: hybrid solutions that produce initial results quickly, and then supply updates to those results as additional (late) data is processed

Tradeoff

Controls the balance between

  • latency - faster results, less accurate
  • completeness (slower results, more accurate)
DataStream<Event> stream = ...;

WatermarkStrategy<Event> strategy = WatermarkStrategy
.<Event>forBoundedOutOfOrderness(Duration.ofSeconds(20))
.withTimestampAssigner((event, timestamp) -> event.timestamp);

DataStream<Event> withTimestampsAndWatermarks =
stream.assignTimestampsAndWatermarks(strategy);

References