Pass the Databricks Databricks Certification Databricks-Certified-Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 1 out of 4 pages
Viewing questions 1-10 out of questions
Questions # 1:

Which of the following Structured Streaming queries is performing a hop from a Silver table to a Gold table?

Options:

A.

Databricks-Certified-Data-Engineer-Associate Question 1 Option 1


B.

1


C.

1


D.

1


E.

1


Expert Solution
Questions # 2:

Which of the following describes a scenario in which a data team will want to utilize cluster pools?

Options:

A.

An automated report needs to be refreshed as quickly as possible.


B.

An automated report needs to be made reproducible.


C.

An automated report needs to be tested to identify errors.


D.

An automated report needs to be version-controlled across multiple collaborators.


E.

An automated report needs to be runnable by all stakeholders.


Expert Solution
Questions # 3:

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

Options:

A.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.


B.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.


C.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.


D.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.


E.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.


Expert Solution
Questions # 4:

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below:

Question # 4

If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data engineer use to fill in the blank?

Options:

A.

processingTime(1)


B.

trigger(availableNow=True)


C.

trigger(parallelBatch=True)


D.

trigger(processingTime="once")


E.

trigger(continuous="once")


Expert Solution
Questions # 5:

Which tool is used by Auto Loader to process data incrementally?

Options:

A.

Spark Structured Streaming


B.

Unity Catalog


C.

Checkpointing


D.

Databricks SQL


Expert Solution
Questions # 6:

A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.

Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?

Options:

A.

They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to "Reliability Optimized."


B.

They can turn on the Auto Stop feature for the SQL endpoint.


C.

They can increase the cluster size of the SQL endpoint.


D.

They can turn on the Serverless feature for the SQL endpoint.


E.

They can increase the maximum bound of the SQL endpoint's scaling range


Expert Solution
Questions # 7:

Which of the following code blocks will remove the rows where the value in column age is greater than 25 from the existing Delta table my_table and save the updated table?

Options:

A.

SELECT * FROM my_table WHERE age > 25;


B.

UPDATE my_table WHERE age > 25;


C.

DELETE FROM my_table WHERE age > 25;


D.

UPDATE my_table WHERE age <= 25;


E.

DELETE FROM my_table WHERE age <= 25;


Expert Solution
Questions # 8:

Which type of workloads are compatible with Auto Loader?

Options:

A.

Streaming workloads


B.

Machine learning workloads


C.

Serverless workloads


D.

Batch workloads


Expert Solution
Questions # 9:

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL. The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which of the following changes will need to be made to the pipeline when migrating to Delta Live Tables?

Options:

A.

None of these changes will need to be made


B.

The pipeline will need to stop using the medallion-based multi-hop architecture


C.

The pipeline will need to be written entirely in SQL


D.

The pipeline will need to use a batch source in place of a streaming source


E.

The pipeline will need to be written entirely in Python


Expert Solution
Questions # 10:

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Development mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

Options:

A.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.


B.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist until the pipeline is shut down.


C.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.


D.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.


E.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.


Expert Solution
Viewing page 1 out of 4 pages
Viewing questions 1-10 out of questions