New Year Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the Databricks Databricks Certification Databricks-Certified-Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 2 out of 5 pages
Viewing questions 11-20 out of questions
Questions # 11:

A data engineer is maintaining an ETL pipeline code with a GitHub repository linked to their Databricks account. The data engineer wants to deploy the ETL pipeline to production as a databricks workflow.

Which approach should the data engineer use?

Options:

A.

Databricks Asset Bundles (DAB) + GitHub Integration


B.

Maintain workflow_config.j son and deploy it using Databricks CLI


C.

Manually create and manage the workflow in Ul


D.

Maintain workflow_conf ig. json and deploy it using Terraform


Expert Solution
Questions # 12:

Which tool is used by Auto Loader to process data incrementally?

Options:

A.

Spark Structured Streaming


B.

Unity Catalog


C.

Checkpointing


D.

Databricks SQL


Expert Solution
Questions # 13:

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL. The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which of the following changes will need to be made to the pipeline when migrating to Delta Live Tables?

Options:

A.

None of these changes will need to be made


B.

The pipeline will need to stop using the medallion-based multi-hop architecture


C.

The pipeline will need to be written entirely in SQL


D.

The pipeline will need to use a batch source in place of a streaming source


E.

The pipeline will need to be written entirely in Python


Expert Solution
Questions # 14:

Which of the following commands will return the number of null values in the member_id column?

Options:

A.

SELECT count(member_id) FROM my_table;


B.

SELECT count(member_id) - count_null(member_id) FROM my_table;


C.

SELECT count_if(member_id IS NULL) FROM my_table;


D.

SELECT null(member_id) FROM my_table;


E.

SELECT count_null(member_id) FROM my_table;


Expert Solution
Questions # 15:

An organization plans to share a large dataset stored in a Databricks workspace on AWS with a partner organization whose Databricks workspace is hosted on Azure. The data engineer wants to minimize data transfer costs while ensuring secure and efficient data sharing.

Which strategy will reduce data egress costs associated with cross-cloud data sharing?

Options:

A.

Sharing data via pre-signed URLs without monitoring egress costs


B.

Migrating the dataset to Cloudflare R2 object storage before sharing


C.

Configure VPN connection between AWS and Azure for faster data sharing


D.

Using Delta Sharing without any additional configurations


Expert Solution
Questions # 16:

An engineering manager uses a Databricks SQL query to monitor ingestion latency for each data source. The manager checks the results of the query every day, but they are manually rerunning the query each day and waiting for the results.

Which of the following approaches can the manager use to ensure the results of the query are updated each day?

Options:

A.

They can schedule the query to refresh every 1 day from the SQL endpoint's page in Databricks SQL.


B.

They can schedule the query to refresh every 12 hours from the SQL endpoint's page in Databricks SQL.


C.

They can schedule the query to refresh every 1 day from the query's page in Databricks SQL.


D.

They can schedule the query to run every 1 day from the Jobs UI.


E.

They can schedule the query to run every 12 hours from the Jobs UI.


Expert Solution
Questions # 17:

What is the maximum output supported by a job cluster to ensure a notebook does not fail?

Options:

A.

10MBS


B.

25MBS


C.

30MBS


D.

15MBS


Expert Solution
Questions # 18:

A data engineer wants to create a relational object by pulling data from two tables. The relational object does not need to be used by other data engineers in other sessions. In order to save on storage costs, the data engineer wants to avoid copying and storing physical data.

Which of the following relational objects should the data engineer create?

Options:

A.

Spark SQL Table


B.

View


C.

Database


D.

Temporary view


E.

Delta Table


Expert Solution
Questions # 19:

A data engineer needs to create a table in Databricks using data from their organization’s existing SQLite database.

They run the following command:

Question # 19

Which of the following lines of code fills in the above blank to successfully complete the task?

Options:

A.

org.apache.spark.sql.jdbc


B.

autoloader


C.

DELTA


D.

sqlite


E.

org.apache.spark.sql.sqlite


Expert Solution
Questions # 20:

A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to an ELT job. The ELT job has its Databricks SQL query that returns the number of input records containing unexpected NULL values. The data engineer wants their entire team to be notified via a messaging webhook whenever this value reaches 100.

Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of NULL values reaches 100?

Options:

A.

They can set up an Alert with a custom template.


B.

They can set up an Alert with a new email alert destination.


C.

They can set up an Alert with a new webhook alert destination.


D.

They can set up an Alert with one-time notifications.


E.

They can set up an Alert without notifications.


Expert Solution
Viewing page 2 out of 5 pages
Viewing questions 11-20 out of questions