Databricks Databricks-Certified-Data-Engineer-Associate Exam Questions Free Practice Test

Viewing page 3 out of 4 pages

Viewing questions 21-30 out of questions

Questions # 21:

A data engineer is attempting to drop a Spark SQL table my_table. The data engineer wants to delete all table metadata and data.

They run the following command:

DROP TABLE IF EXISTS my_table

While the object no longer appears when they run SHOW TABLES, the data files still exist.

Which of the following describes why the data files still exist and the metadata files were deleted?

Options:

The table’s data was larger than 10 GB

The table’s data was smaller than 10 GB

The table was external

The table did not have a location

The table was managed

Expert Solution

Questions # 22:

A data engineer has joined an existing project and they see the following query in the project repository:

CREATE STREAMING LIVE TABLE loyal_customers AS

SELECT customer_id -

FROM STREAM(LIVE.customers)

WHERE loyalty_level = 'high';

Which of the following describes why the STREAM function is included in the query?

Options:

The STREAM function is not needed and will cause an error.

The table being created is a live table.

The customers table is a streaming live table.

The customers table is a reference to a Structured Streaming query on a PySpark DataFrame.

The data in the customers table has been updated since its last run.

Expert Solution

Questions # 23:

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below:

Question # 23

If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data engineer use to fill in the blank?

Options:

processingTime(1)

trigger(availableNow=True)

trigger(parallelBatch=True)

trigger(processingTime="once")

trigger(continuous="once")

Expert Solution

Questions # 24:

A data engineer needs to create a table in Databricks using data from their organization's existing SQLite database. They run the following command:

CREATE TABLE jdbc_customer360

USING

OPTIONS (

url "jdbc:sqlite:/customers.db", dbtable "customer360"

)

Which line of code fills in the above blank to successfully complete the task?

Options:

autoloader

org.apache.spark.sql.jdbc

sqlite

org.apache.spark.sql.sqlite

Expert Solution

Questions # 25:

A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.

Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?

Options:

They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to "Reliability Optimized."

They can turn on the Auto Stop feature for the SQL endpoint.

They can increase the cluster size of the SQL endpoint.

They can turn on the Serverless feature for the SQL endpoint.

They can increase the maximum bound of the SQL endpoint's scaling range

Expert Solution

Questions # 26:

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The cade block used by the data engineer is below:

Question # 26

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

Options:

trigger("5 seconds")

trigger()

trigger(once="5 seconds")

trigger(processingTime="5 seconds")

trigger(continuous="5 seconds")

Expert Solution

Questions # 27:

A data engineer is maintaining a data pipeline. Upon data ingestion, the data engineer notices that the source data is starting to have a lower level of quality. The data engineer would like to automate the process of monitoring the quality level.

Which of the following tools can the data engineer use to solve this problem?

Options:

Unity Catalog

Data Explorer

Delta Lake

Delta Live Tables

Auto Loader

Expert Solution

Questions # 28:

A data engineer has a Job that has a complex run schedule, and they want to transfer that schedule to other Jobs.

Rather than manually selecting each value in the scheduling form in Databricks, which of the following tools can the data engineer use to represent and submit the schedule programmatically?

Options:

pyspark.sql.types.DateType

datetime

pyspark.sql.types.TimestampType

Cron syntax

There is no way to represent and submit this information programmatically

Expert Solution

Questions # 29:

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

Question # 29

The code block used by the data engineer is below:

Which line of code should the data engineer use to fill in the blank if the data engineer only wants the query to execute a micro-batch to process data every 5 seconds?

Options:

trigger("5 seconds")

trigger(continuous="5 seconds")

trigger(once="5 seconds")

trigger(processingTime="5 seconds")

Expert Solution

Questions # 30:

Which of the following Structured Streaming queries is performing a hop from a Silver table to a Gold table?

Options:

Databricks-Certified-Data-Engineer-Associate Question 30 Option 1