Big Halloween Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the Databricks Databricks Certification Databricks-Certified-Professional-Data-Engineer Questions and answers with CertsForce

Viewing page 1 out of 4 pages
Viewing questions 1-10 out of questions
Questions # 1:

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Incremental state information should be maintained for 10 minutes for late-arriving data.

Streaming DataFrame df has the following schema:

"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"

Code block:

Question # 1

Choose the response that correctly fills in the blank within the code block to complete this task.

Options:

A.

withWatermark("event_time", "10 minutes")


B.

awaitArrival("event_time", "10 minutes")


C.

await("event_time + ‘10 minutes'")


D.

slidingWindow("event_time", "10 minutes")


E.

delayWrite("event_time", "10 minutes")


Expert Solution
Questions # 2:

When scheduling Structured Streaming jobs for production, which configuration automatically recovers from query failures and keeps costs low?

Options:

A.

Cluster: New Job Cluster;

Retries: Unlimited;

Maximum Concurrent Runs: Unlimited


B.

Cluster: New Job Cluster;

Retries: None;

Maximum Concurrent Runs: 1


C.

Cluster: Existing All-Purpose Cluster;

Retries: Unlimited;

Maximum Concurrent Runs: 1


D.

Cluster: New Job Cluster;

Retries: Unlimited;

Maximum Concurrent Runs: 1


E.

Cluster: Existing All-Purpose Cluster;

Retries: None;

Maximum Concurrent Runs: 1


Expert Solution
Questions # 3:

The data engineering team has configured a Databricks SQL query and alert to monitor the values in a Delta Lake table. The recent_sensor_recordings table contains an identifying sensor_id alongside the timestamp and temperature for the most recent 5 minutes of recordings.

The below query is used to create the alert:

Question # 3

The query is set to refresh each minute and always completes in less than 10 seconds. The alert is set to trigger when mean (temperature) > 120. Notifications are triggered to be sent at most every 1 minute.

If this alert raises notifications for 3 consecutive minutes and then stops, which statement must be true?

Options:

A.

The total average temperature across all sensors exceeded 120 on three consecutive executions of the query


B.

The recent_sensor_recordingstable was unresponsive for three consecutive runs of the query


C.

The source query failed to update properly for three consecutive minutes and then restarted


D.

The maximum temperature recording for at least one sensor exceeded 120 on three consecutive executions of the query


E.

The average temperature recordings for at least one sensor exceeded 120 on three consecutive executions of the query


Expert Solution
Questions # 4:

In order to facilitate near real-time workloads, a data engineer is creating a helper function to leverage the schema detection and evolution functionality of Databricks Auto Loader. The desired function will automatically detect the schema of the source directly, incrementally process JSON files as they arrive in a source directory, and automatically evolve the schema of the table when new fields are detected.

The function is displayed below with a blank:

Question # 4

Which response correctly fills in the blank to meet the specified requirements?

Question # 4

Options:

A.

Option A


B.

Option B


C.

Option C


D.

Option D


E.

Option E


Expert Solution
Questions # 5:

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device.

Streaming DataFrame df has the following schema:

"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"

Code block:

Question # 5

Choose the response that correctly fills in the blank within the code block to complete this task.

Options:

A.

to_interval("event_time", "5 minutes").alias("time")


B.

window("event_time", "5 minutes").alias("time")


C.

"event_time"


D.

window("event_time", "10 minutes").alias("time")


E.

lag("event_time", "10 minutes").alias("time")


Expert Solution
Questions # 6:

Incorporating unit tests into a PySpark application requires upfront attention to the design of your jobs, or a potentially significant refactoring of existing code.

Which statement describes a main benefit that offset this additional effort?

Options:

A.

Improves the quality of your data


B.

Validates a complete use case of your application


C.

Troubleshooting is easier since all steps are isolated and tested individually


D.

Yields faster deployment and execution times


E.

Ensures that all steps interact correctly to achieve the desired end result


Expert Solution
Questions # 7:

A Delta Lake table representing metadata about content posts from users has the following schema:

    user_id LONG

    post_text STRING

    post_id STRING

    longitude FLOAT

    latitude FLOAT

    post_time TIMESTAMP

    date DATE

Based on the above schema, which column is a good candidate for partitioning the Delta Table?

Options:

A.

date


B.

user_id


C.

post_id


D.

post_time


Expert Solution
Questions # 8:

A member of the data engineering team has submitted a short notebook that they wish to schedule as part of a larger data pipeline. Assume that the commands provided below produce the logically correct results when run as presented.

Question # 8

Which command should be removed from the notebook before scheduling it as a job?

Options:

A.

Cmd 2


B.

Cmd 3


C.

Cmd 4


D.

Cmd 5


E.

Cmd 6


Expert Solution
Questions # 9:

Assuming that the Databricks CLI has been installed and configured correctly, which Databricks CLI command can be used to upload a custom Python Wheel to object storage mounted with the DBFS for use with a production job?

Options:

A.

configure


B.

fs


C.

jobs


D.

libraries


E.

workspace


Expert Solution
Questions # 10:

A table named user_ltv is being used to create a view that will be used by data analysis on various teams. Users in the workspace are configured into groups, which are used for setting up data access using ACLs.

The user_ltv table has the following schema:

Question # 10

An analyze who is not a member of the auditing group executing the following query:

Question # 10

Which result will be returned by this query?

Options:

A.

All columns will be displayed normally for those records that have an age greater than 18; records not meeting this condition will be omitted.


B.

All columns will be displayed normally for those records that have an age greater than 17; records not meeting this condition will be omitted.


C.

All age values less than 18 will be returned as null values all other columns will be returned with the values in user_ltv.


D.

All records from all columns will be displayed with the values in user_ltv.


Expert Solution
Viewing page 1 out of 4 pages
Viewing questions 1-10 out of questions