Databricks Certified Data Engineer Professional Exam Databricks-Certified-Professional-Data-Engineer Question # 34 Topic 4 Discussion

Databricks Certified Data Engineer Professional Exam Databricks-Certified-Professional-Data-Engineer Question # 34 Topic 4 Discussion

Databricks-Certified-Professional-Data-Engineer Exam Topic 4 Question 34 Discussion:
Question #: 34
Topic #: 4

A Structured Streaming job deployed to production has been resulting in higher than expected cloud storage costs. At present, during normal execution, each micro-batch of data is processed in less than 3 seconds; at least 12 times per minute, a micro-batch is processed that contains 0 records. The streaming write was configured using the default trigger settings. The production job is currently scheduled alongside many other Databricks jobs in a workspace with instance pools provisioned to reduce start-up time for jobs with batch execution. Holding all other variables constant and assuming records need to be processed in less than 10 minutes, which adjustment will meet the requirement?


A.

Set the trigger interval to 500 milliseconds; setting a small but non-zero trigger interval ensures that the source is not queried too frequently.


B.

Set the trigger interval to 3 seconds; the default trigger interval is consuming too many records per batch, resulting in spill to disk that can increase volume costs.


C.

Set the trigger interval to 10 minutes; each batch calls APIs in the source storage account, so decreasing trigger frequency to the maximum allowable threshold should minimize this cost.


D.

Use the trigger once option and configure a Databricks job to execute the query every 10 minutes; this approach minimizes costs for both compute and storage.


Get Premium Databricks-Certified-Professional-Data-Engineer Questions

Contribute your Thoughts:


Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.