Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: force70

Databricks Certified Data Engineer Professional Exam Databricks-Certified-Professional-Data-Engineer Question # 16 Topic 2 Discussion

Databricks Certified Data Engineer Professional Exam Databricks-Certified-Professional-Data-Engineer Question # 16 Topic 2 Discussion

Databricks-Certified-Professional-Data-Engineer Exam Topic 2 Question 16 Discussion:
Question #: 16
Topic #: 2

An upstream source writes Parquet data as hourly batches to directories named with the current date. A nightly batch job runs the following code to ingest all data from the previous day as indicated by the date variable:

Databricks-Certified-Professional-Data-Engineer Question 16

Assume that the fields customer_id and order_id serve as a composite key to uniquely identify each order.

If the upstream system is known to occasionally produce duplicate entries for a single order hours apart, which statement is correct?


A.

Each write to the orders table will only contain unique records, and only those records without duplicates in the target table will be written.


B.

Each write to the orders table will only contain unique records, but newly written records may have duplicates already present in the target table.


C.

Each write to the orders table will only contain unique records; if existing records with the same key are present in the target table, these records will be overwritten.


D.

Each write to the orders table will only contain unique records; if existing records with the same key are present in the target table, the operation will tail.


E.

Each write to the orders table will run deduplication over the union of new and existing records, ensuring no duplicate records are present.


Get Premium Databricks-Certified-Professional-Data-Engineer Questions

Contribute your Thoughts:


Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.