A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device.
df has the following schema: device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT
Code block:
df.withWatermark( " event_time " , " 10 minutes " )
.groupBy(
________,
" device_id "
)
.agg(
avg( " temp " ).alias( " avg_temp " ),
avg( " humidity " ).alias( " avg_humidity " )
)
.writeStream
.format( " delta " )
.saveAsTable( " sensor_avg " )
Which line of code correctly fills in the blank within the code block to complete this task?
Submit