Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: force70

Pass the Databricks Databricks Certification Databricks-Certified-Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 3 out of 6 pages
Viewing questions 21-30 out of questions
Questions # 21:

A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only ingest those new files with each run.

Which of the following tools can the data engineer use to solve this problem?

Options:

A.

Unity Catalog


B.

Delta Lake


C.

Databricks SQL


D.

Data Explorer


E.

Auto Loader


Expert Solution
Questions # 22:

What Databricks feature can be used to check the data sources and tables used in a workspace?

Options:

A.

Do not use the lineage feature as it only tracks activity from the last 3 months and will not provide full details on dependencies.


B.

Use the lineage feature to visualize a graph that highlights where the table is used only in notebooks,


C.

Use the lineage feature to visualize a graph that highlights where the table is used only in reports.


D.

Use the lineage feature to visualize a graph that shows all dependencies, including where the table is used in notebooks, other tables, and reports.


Expert Solution
Questions # 23:

A data engineer wants to create a data entity from a couple of tables. The data entity must be used by other data engineers in other sessions. It also must be saved to a physical location.

Which of the following data entities should the data engineer create?

Options:

A.

Database


B.

Function


C.

View


D.

Temporary view


E.

Table


Expert Solution
Questions # 24:

A data engineer has created a new database using the following command:

CREATE DATABASE IF NOT EXISTS customer360;

In which of the following locations will the customer360 database be located?

Options:

A.

dbfs:/user/hive/database/customer360


B.

dbfs:/user/hive/warehouse


C.

dbfs:/user/hive/customer360


D.

More information is needed to determine the correct response


Expert Solution
Questions # 25:

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

Question # 25

The code block used by the data engineer is below:

Which line of code should the data engineer use to fill in the blank if the data engineer only wants the query to execute a micro-batch to process data every 5 seconds?

Options:

A.

trigger( " 5 seconds " )


B.

trigger(continuous= " 5 seconds " )


C.

trigger(once= " 5 seconds " )


D.

trigger(processingTime= " 5 seconds " )


Expert Solution
Questions # 26:

A data analyst has a series of queries in a SQL program. The data analyst wants this program to run every day. They only want the final query in the program to run on Sundays. They ask for help from the data engineering team to complete this task.

Which of the following approaches could be used by the data engineering team to complete this task?

Options:

A.

They could submit a feature request with Databricks to add this functionality.


B.

They could wrap the queries using PySpark and use Python’s control flow system to determine when to run the final query.


C.

They could only run the entire program on Sundays.


D.

They could automatically restrict access to the source table in the final query so that it is only accessible on Sundays.


E.

They could redesign the data model to separate the data used in the final query into a new table.


Expert Solution
Questions # 27:

A data engineer only wants to execute the final block of a Python program if the Python variable day_of_week is equal to 1 and the Python variable review_period is True.

Which of the following control flow statements should the data engineer use to begin this conditionally executed code block?

Options:

A.

if day_of_week = 1 and review_period:


B.

if day_of_week = 1 and review_period = " True " :


C.

if day_of_week == 1 and review_period == " True " :


D.

if day_of_week == 1 and review_period:


E.

if day_of_week = 1 & review_period: = " True " :


Expert Solution
Questions # 28:

A data engineer is decommissioning a sandbox schema in Unity Catalog. Some tables are ephemeral staging outputs that can be safely removed entirely, but a few tables point at shared cloud storage used by downstream jobs outside Databricks. The engineer must avoid deleting any shared files when cleaning up catalog objects.

How does Unity Catalog behave when dropping Managed vs External tables?

Options:

A.

Drop all tables; Databricks will only remove metadata for both managed and external tables


B.

Drop managed tables that are ephemeral and drop external tables; files for both remain for 7 days


C.

Drop managed staging tables to remove data and metadata, and drop external tables to remove only metadata


D.

Drop external tables first to delete their files, then drop managed tables to keep their files for recovery


Expert Solution
Questions # 29:

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > ' 2020-01-01 ' ) ON VIOLATION DROP ROW

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

Options:

A.

Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.


B.

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.


C.

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.


D.

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.


E.

Records that violate the expectation cause the job to fail.


Expert Solution
Questions # 30:

Which tool is used by Auto Loader to process data incrementally?

Options:

A.

Spark Structured Streaming


B.

Unity Catalog


C.

Checkpointing


D.

Databricks SQL


Expert Solution
Viewing page 3 out of 6 pages
Viewing questions 21-30 out of questions