Pass the Databricks Databricks Certification Databricks-Certified-Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 2 out of 4 pages
Viewing questions 11-20 out of questions
Questions # 11:

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when it is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

Options:

A.

They can turn on the Auto Stop feature for the SQL endpoint.


B.

They can ensure the dashboard's SQL endpoint is not one of the included query's SQL endpoint.


C.

They can reduce the cluster size of the SQL endpoint.


D.

They can ensure the dashboard's SQL endpoint matches each of the queries' SQL endpoints.


E.

They can set up the dashboard's SQL endpoint to be serverless.


Expert Solution
Questions # 12:

A data engineer has a Python variable table_name that they would like to use in a SQL query. They want to construct a Python code block that will run the query using table_name.

They have the following incomplete code block:

____(f"SELECT customer_id, spend FROM {table_name}")

Which of the following can be used to fill in the blank to successfully complete the task?

Options:

A.

spark.delta.sql


B.

spark.delta.table


C.

spark.table


D.

dbutils.sql


E.

spark.sql


Expert Solution
Questions # 13:

Which of the following describes when to use the CREATE STREAMING LIVE TABLE (formerly CREATE INCREMENTAL LIVE TABLE) syntax over the CREATE LIVE TABLE syntax when creating Delta Live Tables (DLT) tables using SQL?

Options:

A.

CREATE STREAMING LIVE TABLE should be used when the subsequent step in the DLT pipeline is static.


B.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed incrementally.


C.

CREATE STREAMING LIVE TABLE is redundant for DLT and it does not need to be used.


D.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed through complicated aggregations.


E.

CREATE STREAMING LIVE TABLE should be used when the previous step in the DLT pipeline is static.


Expert Solution
Questions # 14:

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.

Which of the following describes why Auto Loader inferred all of the columns to be of the string type?

Options:

A.

There was a type mismatch between the specific schema and the inferred schema


B.

JSON data is a text-based format


C.

Auto Loader only works with string data


D.

All of the fields had at least one null value


E.

Auto Loader cannot infer the schema of ingested data


Expert Solution
Questions # 15:

Which SQL keyword can be used to convert a table from a long format to a wide format?

Options:

A.

TRANSFORM


B.

PIVOT


C.

SUM


D.

CONVERT


Expert Solution
Questions # 16:

Which of the following must be specified when creating a new Delta Live Tables pipeline?

Options:

A.

A key-value pair configuration


B.

The preferred DBU/hour cost


C.

A path to cloud storage location for the written data


D.

A location of a target database for the written data


E.

At least one notebook library to be executed


Expert Solution
Questions # 17:

Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?

Options:

A.

Parquet files can be partitioned


B.

CREATE TABLE AS SELECT statements cannot be used on files


C.

Parquet files have a well-defined schema


D.

Parquet files have the ability to be optimized


E.

Parquet files will become Delta tables


Expert Solution
Questions # 18:

A data analyst has a series of queries in a SQL program. The data analyst wants this program to run every day. They only want the final query in the program to run on Sundays. They ask for help from the data engineering team to complete this task.

Which of the following approaches could be used by the data engineering team to complete this task?

Options:

A.

They could submit a feature request with Databricks to add this functionality.


B.

They could wrap the queries using PySpark and use Python’s control flow system to determine when to run the final query.


C.

They could only run the entire program on Sundays.


D.

They could automatically restrict access to the source table in the final query so that it is only accessible on Sundays.


E.

They could redesign the data model to separate the data used in the final query into a new table.


Expert Solution
Questions # 19:

In which of the following scenarios should a data engineer select a Task in the Depends On field of a new Databricks Job Task?

Options:

A.

When another task needs to be replaced by the new task


B.

When another task needs to fail before the new task begins


C.

When another task has the same dependency libraries as the new task


D.

When another task needs to use as little compute resources as possible


E.

When another task needs to successfully complete before the new task begins


Expert Solution
Questions # 20:

A data engineering team has two tables. The first table march_transactions is a collection of all retail transactions in the month of March. The second table april_transactions is a collection of all retail transactions in the month of April. There are no duplicate records between the tables.

Which of the following commands should be run to create a new table all_transactions that contains all records from march_transactions and april_transactions without duplicate records?

Options:

A.

CREATE TABLE all_transactions AS

SELECT * FROM march_transactions

INNER JOIN SELECT * FROM april_transactions;


B.

CREATE TABLE all_transactions AS

SELECT * FROM march_transactions

UNION SELECT * FROM april_transactions;


C.

CREATE TABLE all_transactions AS

SELECT * FROM march_transactions

OUTER JOIN SELECT * FROM april_transactions;


D.

CREATE TABLE all_transactions AS

SELECT * FROM march_transactions

INTERSECT SELECT * from april_transactions;


E.

CREATE TABLE all_transactions AS

SELECT * FROM march_transactions

MERGE SELECT * FROM april_transactions;


Expert Solution
Viewing page 2 out of 4 pages
Viewing questions 11-20 out of questions