Big Halloween Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the Databricks Databricks Certification Databricks-Certified-Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 3 out of 4 pages
Viewing questions 21-30 out of questions
Questions # 21:

A data engineer is attempting to drop a Spark SQL table my_table. The data engineer wants to delete all table metadata and data.

They run the following command:

DROP TABLE IF EXISTS my_table

While the object no longer appears when they run SHOW TABLES, the data files still exist.

Which of the following describes why the data files still exist and the metadata files were deleted?

Options:

A.

The table’s data was larger than 10 GB


B.

The table’s data was smaller than 10 GB


C.

The table was external


D.

The table did not have a location


E.

The table was managed


Expert Solution
Questions # 22:

A data engineer has joined an existing project and they see the following query in the project repository:

CREATE STREAMING LIVE TABLE loyal_customers AS

SELECT customer_id -

FROM STREAM(LIVE.customers)

WHERE loyalty_level = 'high';

Which of the following describes why the STREAM function is included in the query?

Options:

A.

The STREAM function is not needed and will cause an error.


B.

The table being created is a live table.


C.

The customers table is a streaming live table.


D.

The customers table is a reference to a Structured Streaming query on a PySpark DataFrame.


E.

The data in the customers table has been updated since its last run.


Expert Solution
Questions # 23:

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below:

Question # 23

If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data engineer use to fill in the blank?

Options:

A.

processingTime(1)


B.

trigger(availableNow=True)


C.

trigger(parallelBatch=True)


D.

trigger(processingTime="once")


E.

trigger(continuous="once")


Expert Solution
Questions # 24:

A data engineer needs to create a table in Databricks using data from their organization's existing SQLite database. They run the following command:

CREATE TABLE jdbc_customer360

USING

OPTIONS (

url "jdbc:sqlite:/customers.db", dbtable "customer360"

)

Which line of code fills in the above blank to successfully complete the task?

Options:

A.

autoloader


B.

org.apache.spark.sql.jdbc


C.

sqlite


D.

org.apache.spark.sql.sqlite


Expert Solution
Questions # 25:

A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.

Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?

Options:

A.

They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to "Reliability Optimized."


B.

They can turn on the Auto Stop feature for the SQL endpoint.


C.

They can increase the cluster size of the SQL endpoint.


D.

They can turn on the Serverless feature for the SQL endpoint.


E.

They can increase the maximum bound of the SQL endpoint's scaling range


Expert Solution
Questions # 26:

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The cade block used by the data engineer is below:

Question # 26

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

Options:

A.

trigger("5 seconds")


B.

trigger()


C.

trigger(once="5 seconds")


D.

trigger(processingTime="5 seconds")


E.

trigger(continuous="5 seconds")


Expert Solution
Questions # 27:

A data engineer is maintaining a data pipeline. Upon data ingestion, the data engineer notices that the source data is starting to have a lower level of quality. The data engineer would like to automate the process of monitoring the quality level.

Which of the following tools can the data engineer use to solve this problem?

Options:

A.

Unity Catalog


B.

Data Explorer


C.

Delta Lake


D.

Delta Live Tables


E.

Auto Loader


Expert Solution
Questions # 28:

A data engineer has a Job that has a complex run schedule, and they want to transfer that schedule to other Jobs.

Rather than manually selecting each value in the scheduling form in Databricks, which of the following tools can the data engineer use to represent and submit the schedule programmatically?

Options:

A.

pyspark.sql.types.DateType


B.

datetime


C.

pyspark.sql.types.TimestampType


D.

Cron syntax


E.

There is no way to represent and submit this information programmatically


Expert Solution
Questions # 29:

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

Question # 29

The code block used by the data engineer is below:

Which line of code should the data engineer use to fill in the blank if the data engineer only wants the query to execute a micro-batch to process data every 5 seconds?

Options:

A.

trigger("5 seconds")


B.

trigger(continuous="5 seconds")


C.

trigger(once="5 seconds")


D.

trigger(processingTime="5 seconds")


Expert Solution
Questions # 30:

Which of the following Structured Streaming queries is performing a hop from a Silver table to a Gold table?

Options:

A.

Databricks-Certified-Data-Engineer-Associate Question 30 Option 1


B.

30


C.

30


D.

30


E.

30


Expert Solution
Viewing page 3 out of 4 pages
Viewing questions 21-30 out of questions