Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: force70

Databricks Certified Data Engineer Professional Exam Databricks-Certified-Professional-Data-Engineer Question # 9 Topic 1 Discussion

Databricks Certified Data Engineer Professional Exam Databricks-Certified-Professional-Data-Engineer Question # 9 Topic 1 Discussion

Databricks-Certified-Professional-Data-Engineer Exam Topic 1 Question 9 Discussion:
Question #: 9
Topic #: 1

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device.

df has the following schema: device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT

Code block:

df.withWatermark( " event_time " , " 10 minutes " )

.groupBy(

________,

" device_id "

)

.agg(

avg( " temp " ).alias( " avg_temp " ),

avg( " humidity " ).alias( " avg_humidity " )

)

.writeStream

.format( " delta " )

.saveAsTable( " sensor_avg " )

Which line of code correctly fills in the blank within the code block to complete this task?


A.

window( " event_time " , " 5 minutes " ).alias( " time " )


B.

to_interval( " event_time " , " 5 minutes " ).alias( " time " )


C.

" event_time "


D.

lag( " event_time " , " 5 minutes " ).alias( " time " )


Get Premium Databricks-Certified-Professional-Data-Engineer Questions

Contribute your Thoughts:


Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.