Databricks Databricks-Certified-Data-Engineer-Associate Exam Questions Free Practice Test

Viewing page 2 out of 5 pages

Viewing questions 11-20 out of questions

Questions # 11:

A data engineer is maintaining an ETL pipeline code with a GitHub repository linked to their Databricks account. The data engineer wants to deploy the ETL pipeline to production as a databricks workflow.

Which approach should the data engineer use?

Options:

Databricks Asset Bundles (DAB) + GitHub Integration

Maintain workflow_config.j son and deploy it using Databricks CLI

Manually create and manage the workflow in Ul

Maintain workflow_conf ig. json and deploy it using Terraform

Expert Solution

Questions # 12:

Which tool is used by Auto Loader to process data incrementally?

Options:

Spark Structured Streaming

Unity Catalog

Checkpointing

Databricks SQL

Expert Solution

Questions # 13:

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL. The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which of the following changes will need to be made to the pipeline when migrating to Delta Live Tables?

Options:

None of these changes will need to be made

The pipeline will need to stop using the medallion-based multi-hop architecture

The pipeline will need to be written entirely in SQL

The pipeline will need to use a batch source in place of a streaming source

The pipeline will need to be written entirely in Python

Expert Solution

Questions # 14:

Which of the following commands will return the number of null values in the member_id column?

Options:

SELECT count(member_id) FROM my_table;

SELECT count(member_id) - count_null(member_id) FROM my_table;

SELECT count_if(member_id IS NULL) FROM my_table;

SELECT null(member_id) FROM my_table;

SELECT count_null(member_id) FROM my_table;

Expert Solution

Questions # 15:

An organization plans to share a large dataset stored in a Databricks workspace on AWS with a partner organization whose Databricks workspace is hosted on Azure. The data engineer wants to minimize data transfer costs while ensuring secure and efficient data sharing.

Which strategy will reduce data egress costs associated with cross-cloud data sharing?

Options:

Sharing data via pre-signed URLs without monitoring egress costs

Migrating the dataset to Cloudflare R2 object storage before sharing

Configure VPN connection between AWS and Azure for faster data sharing

Using Delta Sharing without any additional configurations

Expert Solution

Questions # 16:

An engineering manager uses a Databricks SQL query to monitor ingestion latency for each data source. The manager checks the results of the query every day, but they are manually rerunning the query each day and waiting for the results.

Which of the following approaches can the manager use to ensure the results of the query are updated each day?

Options:

They can schedule the query to refresh every 1 day from the SQL endpoint's page in Databricks SQL.

They can schedule the query to refresh every 12 hours from the SQL endpoint's page in Databricks SQL.

They can schedule the query to refresh every 1 day from the query's page in Databricks SQL.

They can schedule the query to run every 1 day from the Jobs UI.

They can schedule the query to run every 12 hours from the Jobs UI.

Expert Solution

Questions # 17:

What is the maximum output supported by a job cluster to ensure a notebook does not fail?

Options:

10MBS

25MBS

30MBS

15MBS

Expert Solution

Questions # 18:

A data engineer wants to create a relational object by pulling data from two tables. The relational object does not need to be used by other data engineers in other sessions. In order to save on storage costs, the data engineer wants to avoid copying and storing physical data.

Which of the following relational objects should the data engineer create?

Options:

Spark SQL Table

View

Database

Temporary view

Delta Table

Expert Solution

Questions # 19:

A data engineer needs to create a table in Databricks using data from their organization’s existing SQLite database.

They run the following command:

Question # 19

Which of the following lines of code fills in the above blank to successfully complete the task?

Options:

org.apache.spark.sql.jdbc

autoloader

DELTA

sqlite

org.apache.spark.sql.sqlite

Expert Solution

Questions # 20:

A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to an ELT job. The ELT job has its Databricks SQL query that returns the number of input records containing unexpected NULL values. The data engineer wants their entire team to be notified via a messaging webhook whenever this value reaches 100.

Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of NULL values reaches 100?

Options:

They can set up an Alert with a custom template.

They can set up an Alert with a new email alert destination.

They can set up an Alert with a new webhook alert destination.

They can set up an Alert with one-time notifications.

They can set up an Alert without notifications.

Expert Solution

Viewing page 2 out of 5 pages

Viewing questions 11-20 out of questions

Winter Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: pass65

Pass the Databricks Databricks Certification Databricks-Certified-Data-Engineer-Associate Questions and answers with CertsForce