Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: force70

Pass the Databricks Databricks Certification Databricks-Certified-Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 1 out of 6 pages
Viewing questions 1-10 out of questions
Questions # 1:

Which of the following is hosted completely in the control plane of the classic Databricks architecture?

Options:

A.

Worker node


B.

JDBC data source


C.

Databricks web application


D.

Databricks Filesystem


E.

Driver node


Expert Solution
Questions # 2:

The Delta transaction log for the ‘students’ tables is shown using the ‘DESCRIBE HISTORY students’ command. A Data Engineer needs to query the table as it existed before the UPDATE operation listed in the log.

Question # 2

Which command should the Data Engineer use to achieve this? (Choose two.)

Options:

A.

SELECT * FROM students@v4


B.

SELECT * FROM students TIMESTAMP AS OF ‘2024-04-22T 14:32:47.000+00:00’


C.

SELECT * FROM students FROM HISTORY VERSION AS OF 3


D.

SELECT * FROM students VERSION AS OF 5


E.

SELECT * FROM students TIMESTAMP AS OF ‘2024-04-22T 14:32:58.000+00:00’


Expert Solution
Questions # 3:

A data engineer has been provided a PySpark DataFrame named df with columns product and revenue. The data engineer needs to compute complex aggregations to determine each product ' s total revenue, average revenue, and transaction count.

Which code snippet should the data engineer use?

A)

Question # 3

B)

Question # 3

C)

Question # 3

D)

Question # 3

Options:

A.

Option A


B.

Option B


C.

Option C


D.

Option D


Expert Solution
Questions # 4:

A data engineer that is new to using Python needs to create a Python function to add two integers together and return the sum?

Which of the following code blocks can the data engineer use to complete this task?

A)

Question # 4

B)

Question # 4

C)

Question # 4

D)

Question # 4

E)

Question # 4

Options:

A.

Option A


B.

Option B


C.

Option C


D.

Option D


E.

Option E


Expert Solution
Questions # 5:

Which method should a Data Engineer apply to ensure Workflows are being triggered on schedule?

Options:

A.

Scheduled Workflows require an always-running cluster, which is more expensive but reduces processing latency.


B.

Scheduled Workflows process data as it arrives at configured sources.


C.

Scheduled Workflows can reduce resource consumption and expense since the cluster runs only long enough to execute the pipeline.


D.

Scheduled Workflows run continuously until manually stopped.


Expert Solution
Questions # 6:

A data engineer streams customer orders into a Kafka topic (orders_topic) and is currently writing the ingestion script of a DLT pipeline. The data engineer needs to ingest the data from Kafka brokers to DLT using Databricks

What is the correct code for ingesting the data?

A)

Question # 6

B)

Question # 6

C)

Question # 6

D)

Question # 6

Options:

A.

Option A


B.

Option B


C.

Option C


D.

Option D


Expert Solution
Questions # 7:

In which of the following scenarios should a data engineer use the MERGE INTO command instead of the INSERT INTO command?

Options:

A.

When the location of the data needs to be changed


B.

When the target table is an external table


C.

When the source table can be deleted


D.

When the target table cannot contain duplicate records


E.

When the source is not a Delta table


Expert Solution
Questions # 8:

A data engineering project involves processing large batches of data on a daily schedule using ETL. The jobs are resource-intensive and vary in size, requiring a scalable, cost-efficient compute solution that can automatically scale based on the workload.

Which compute approach will satisfy the needs described?

Options:

A.

Databricks SQL Serverless


B.

Dedicated Cluster


C.

All-Purpose Cluster


D.

Job Cluster


Expert Solution
Questions # 9:

A data analysis team has noticed that their Databricks SQL queries are running too slowly when connected to their always-on SQL endpoint. They claim that this issue is present when many members of the team are running small queries simultaneously. They ask the data engineering team for help. The data engineering team notices that each of the team’s queries uses the same SQL endpoint.

Which of the following approaches can the data engineering team use to improve the latency of the team’s queries?

Options:

A.

They can increase the cluster size of the SQL endpoint.


B.

They can increase the maximum bound of the SQL endpoint’s scaling range.


C.

They can turn on the Auto Stop feature for the SQL endpoint.


D.

They can turn on the Serverless feature for the SQL endpoint.


E.

They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to “Reliability Optimized.”


Expert Solution
Questions # 10:

Which of the following describes when to use the CREATE STREAMING LIVE TABLE (formerly CREATE INCREMENTAL LIVE TABLE) syntax over the CREATE LIVE TABLE syntax when creating Delta Live Tables (DLT) tables using SQL?

Options:

A.

CREATE STREAMING LIVE TABLE should be used when the subsequent step in the DLT pipeline is static.


B.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed incrementally.


C.

CREATE STREAMING LIVE TABLE is redundant for DLT and it does not need to be used.


D.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed through complicated aggregations.


E.

CREATE STREAMING LIVE TABLE should be used when the previous step in the DLT pipeline is static.


Expert Solution
Viewing page 1 out of 6 pages
Viewing questions 1-10 out of questions