Big Halloween Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the Databricks Databricks Certification Databricks-Certified-Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 1 out of 4 pages
Viewing questions 1-10 out of questions
Questions # 1:

A data engineer has a Job with multiple tasks that runs nightly. Each of the tasks runs slowly because the clusters take a long time to start.

Which of the following actions can the data engineer perform to improve the start up time for the clusters used for the Job?

Options:

A.

They can use endpoints available in Databricks SQL


B.

They can use jobs clusters instead of all-purpose clusters


C.

They can configure the clusters to be single-node


D.

They can use clusters that are from a cluster pool


E.

They can configure the clusters to autoscale for larger data sizes


Expert Solution
Questions # 2:

Which method should a Data Engineer apply to ensure Workflows are being triggered on schedule?

Options:

A.

Scheduled Workflows require an always-running cluster, which is more expensive but reduces processing latency.


B.

Scheduled Workflows process data as it arrives at configured sources.


C.

Scheduled Workflows can reduce resource consumption and expense since the cluster runs only long enough to execute the pipeline.


D.

Scheduled Workflows run continuously until manually stopped.


Expert Solution
Questions # 3:

Which of the following data lakehouse features results in improved data quality over a traditional data lake?

Options:

A.

A data lakehouse provides storage solutions for structured and unstructured data.


B.

A data lakehouse supports ACID-compliant transactions.


C.

A data lakehouse allows the use of SQL queries to examine data.


D.

A data lakehouse stores data in open formats.


E.

A data lakehouse enables machine learning and artificial Intelligence workloads.


Expert Solution
Questions # 4:

A single Job runs two notebooks as two separate tasks. A data engineer has noticed that one of the notebooks is running slowly in the Job’s current run. The data engineer asks a tech lead for help in identifying why this might be the case.

Which of the following approaches can the tech lead use to identify why the notebook is running slowly as part of the Job?

Options:

A.

They can navigate to the Runs tab in the Jobs UI to immediately review the processing notebook.


B.

They can navigate to the Tasks tab in the Jobs UI and click on the active run to review the processing notebook.


C.

They can navigate to the Runs tab in the Jobs UI and click on the active run to review the processing notebook.


D.

There is no way to determine why a Job task is running slowly.


E.

They can navigate to the Tasks tab in the Jobs UI to immediately review the processing notebook.


Expert Solution
Questions # 5:

A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a specific task within a cell. They still want all of the other cells to use Python without making any changes to those cells.

Which of the following describes how the data engineer can use SQL within a cell of their Python notebook?

Options:

A.

It is not possible to use SQL in a Python notebook


B.

They can attach the cell to a SQL endpoint rather than a Databricks cluster


C.

They can simply write SQL syntax in the cell


D.

They can add %sql to the first line of the cell


E.

They can change the default language of the notebook to SQL


Expert Solution
Questions # 6:

Which of the following data lakehouse features results in improved data quality over a traditional data lake?

Options:

A.

A data lakehouse provides storage solutions for structured and unstructured data.


B.

A data lakehouse supports ACID-compliant transactions.


C.

A data lakehouse allows the use of SQL queries to examine data.


D.

A data lakehouse stores data in open formats.


E.

A data lakehouse enables machine learning and artificial Intelligence workloads.


Expert Solution
Questions # 7:

A data engineer wants to create a data entity from a couple of tables. The data entity must be used by other data engineers in other sessions. It also must be saved to a physical location.

Which of the following data entities should the data engineer create?

Options:

A.

Database


B.

Function


C.

View


D.

Temporary view


E.

Table


Expert Solution
Questions # 8:

A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table:

Question # 8

Which of the following changes needs to be made so this code block will work when the transactions table is a stream source?

Options:

A.

Replace predict with a stream-friendly prediction function


B.

Replace schema(schema) with option ("maxFilesPerTrigger", 1)


C.

Replace "transactions" with the path to the location of the Delta table


D.

Replace format("delta") with format("stream")


E.

Replace spark.read with spark.readStream


Expert Solution
Questions # 9:

Which of the following is a benefit of the Databricks Lakehouse Platform embracing open source technologies?

Options:

A.

Cloud-specific integrations


B.

Simplified governance


C.

Ability to scale storage


D.

Ability to scale workloads


E.

Avoiding vendor lock-in


Expert Solution
Questions # 10:

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when It is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which approach can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

Options:

A.

O They can reduce the cluster size of the SQL endpoint.


B.

Q They can turn on the Auto Stop feature for the SQL endpoint.


C.

O They can set up the dashboard's SQL endpoint to be serverless.


D.

0 They can ensure the dashboard's SQL endpoint matches each of the queries' SQL endpoints.


Expert Solution
Viewing page 1 out of 4 pages
Viewing questions 1-10 out of questions