What happens to data that arrives after the watermark threshold?
Options:
A.
Records that arrive later than the watermark threshold (10 minutes) will automatically be included in the aggregation if they fall within the 15-minute window.
B.
Any data arriving more than 10 minutes after the watermark threshold will be ignored and not included in the aggregation.
C.
Data arriving more than 10 minutes after the latest watermark will still be included in the aggregation but will be placed into the next window.
D.
The watermark ensures that late data arriving within 10 minutes of the latest event_time will be processed and included in the windowed aggregation.
“Records that are older than the watermark (event time < current watermark) are considered too late and are dropped.”
So, if a record’sevent_timeis earlier than (max event_time seen so far - 10 minutes), it is discarded.
[Reference:Structured Streaming - Handling Late Data]
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit