Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: force70

Pass the Databricks Databricks Certification Databricks-Certified-Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 4 out of 6 pages
Viewing questions 31-40 out of questions
Questions # 31:

A data engineer needs to conduct Exploratory Data Analysis (EDA) on data residing in a database within the company’s custom-defined cloud network . The data engineer is using SQL for this task.

Which type of SQL Warehouse will enable the data engineer to process large numbers of queries quickly and cost-effectively?

Options:

A.

All-purpose compute cluster


B.

Pro SQL Warehouse


C.

SQL Serverless Warehouse


D.

Classic SQL Warehouse


Expert Solution
Questions # 32:

Which SQL code snippet will correctly demonstrate a Data Definition Language (DDL) operation used to create a table?

Options:

A.

DROP TABLE employees;


B.

INSERT INTO employees (id, name) VALUES (1, ' Alice ' );


C.

CRFATF tabif employees ( id INT, name suing


D.

ALTFR TABIF employees add column salary DECTMA(10,2);


Expert Solution
Questions # 33:

A data engineer is reviewing the documentation on audit logs in Databricks for compliance purposes and needs to understand the format in which audit logs output events.

How are events formatted in Databricks audit logs?

Options:

A.

In Databricks, audit logs output events in a plain text format. In Databricks, audit logs output events in a JSON format.


B.

In Databricks, audit logs output events in an XML format.


C.

In Databricks, audit logs output events in a CSV format.


Expert Solution
Questions # 34:

A data engineer needs to apply custom logic to identify employees with more than 5 years of experience in array column employees in table stores. The custom logic should create a new column exp_employees that is an array of all of the employees with more than 5 years of experience for each row. In order to apply this custom logic at scale, the data engineer wants to use the FILTER higher-order function.

Which of the following code blocks successfully completes this task?

Question # 34

Options:

A.

Option A


B.

Option B


C.

Option C


D.

Option D


E.

Option E


Expert Solution
Questions # 35:

Which Databricks Asset Bundle format is valid?

Options:

A.

resources:

jobs:

hello-job:

name: hello-job

tasks:

- task_key: hello-task

existing_cluster_id: 1234-567890-abcde123

notebook_task:

notebook_path: ./hello.py


B.

{

" resources " : {

" jobs " : {

" name " : " hello-job " ,

" tasks " : {

" task_key " : " hello-task " ,

" existing_cluster_id " : " 1234-567890-abcde123 " ,

" notebook_task " : {

" notebook_path " : " ./hello.py "

}

}

}

}

}


C.

configuration = {

" resources " : {

" jobs " : {

" name " : " hello-job " ,

" tasks " : {

" task_key " : " hello-task " ,

" existing_cluster_id " : " 1234-567890-abcde123 " ,

" notebook_task " : {

" notebook_path " : " ./hello.py "

}

}

}

}

}


D.

resources {

jobs {

name = " hello-job "

tasks {

task_key = " hello-task "

existing_cluster_id = " 1234-567890-abcde123 "

notebook_task {

notebook_path = " ./hello.py "

}

}

}

}


Expert Solution
Questions # 36:

In which of the following scenarios should a data engineer select a Task in the Depends On field of a new Databricks Job Task?

Options:

A.

When another task needs to be replaced by the new task


B.

When another task needs to fail before the new task begins


C.

When another task has the same dependency libraries as the new task


D.

When another task needs to use as little compute resources as possible


E.

When another task needs to successfully complete before the new task begins


Expert Solution
Questions # 37:

Which of the following data lakehouse features results in improved data quality over a traditional data lake?

Options:

A.

A data lakehouse provides storage solutions for structured and unstructured data.


B.

A data lakehouse supports ACID-compliant transactions.


C.

A data lakehouse allows the use of SQL queries to examine data.


D.

A data lakehouse stores data in open formats.


E.

A data lakehouse enables machine learning and artificial Intelligence workloads.


Expert Solution
Questions # 38:

A data engineer has joined an existing project and they see the following query in the project repository:

CREATE STREAMING LIVE TABLE loyal_customers AS

SELECT customer_id -

FROM STREAM(LIVE.customers)

WHERE loyalty_level = ' high ' ;

Which of the following describes why the STREAM function is included in the query?

Options:

A.

The STREAM function is not needed and will cause an error.


B.

The table being created is a live table.


C.

The customers table is a streaming live table.


D.

The customers table is a reference to a Structured Streaming query on a PySpark DataFrame.


E.

The data in the customers table has been updated since its last run.


Expert Solution
Questions # 39:

An organization has data stored across multiple external systems, including MySQL, Amazon Redshift, and Google BigQuery. The data engineer wants to perform analytics without ingesting data directly into Databricks, while ensuring unified governance and minimizing data duplication.

Which feature of Databricks enables querying these external data sources while maintaining centralized governance?

Options:

A.

Lakehouse Federation


B.

Databricks Connect


C.

MLflow


D.

Delta Lake


Expert Solution
Questions # 40:

An organization has implemented a data pipeline in Databricks and needs to ensure it can scale automatically based on varying workloads without manual cluster management. The goal is to meet the company’s Service Level Agreements (SLAs), which require high availability and minimal downtime, while Databricks automatically handles resource allocation and optimization.

Which approach fulfills these requirements?

Options:

A.

Use Serverless compute in Databricks to automatically scale and provision resources with minimal manual intervention


B.

Deploy job clusters with fixed configurations, dedicated to specific tasks, without automatic scaling


C.

Use spot instances to allocate resources dynamically while minimizing costs, with potential interruptions


D.

Use interactive clusters in Databricks, adjusting cluster sizes manually based on workload demands


Expert Solution
Viewing page 4 out of 6 pages
Viewing questions 31-40 out of questions