New Year Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the Databricks Databricks Certification Databricks-Certified-Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 1 out of 5 pages
Viewing questions 1-10 out of questions
Questions # 1:

Which of the following data lakehouse features results in improved data quality over a traditional data lake?

Options:

A.

A data lakehouse provides storage solutions for structured and unstructured data.


B.

A data lakehouse supports ACID-compliant transactions.


C.

A data lakehouse allows the use of SQL queries to examine data.


D.

A data lakehouse stores data in open formats.


E.

A data lakehouse enables machine learning and artificial Intelligence workloads.


Expert Solution
Questions # 2:

A data engineer is working with two tables. Each of these tables is displayed below in its entirety.

Question # 2

The data engineer runs the following query to join these tables together:

Question # 2

Which of the following will be returned by the above query?

Question # 2

Options:

A.

Option A


B.

Option B


C.

Option C


D.

Option D


E.

Option E


Expert Solution
Questions # 3:

A data engineer is reviewing the documentation on audit logs in Databricks for compliance purposes and needs to understand the format in which audit logs output events.

How are events formatted in Databricks audit logs?

Options:

A.

In Databricks, audit logs output events in a plain text format.

In Databricks, audit logs output events in a JSON format.


B.

In Databricks, audit logs output events in an XML format.


C.

In Databricks, audit logs output events in a CSV format.


Expert Solution
Questions # 4:

A global retail company sells products across multiple categories (e.g.. Electronics, Clothing) and regions (e.g.. North. South, East. West). The sales team has provided the data engineer with a PySpark dataframe named sales_df as below and the team wants the data engineer to analyze the sales data to help them make strategic decisions.

Question # 4

Options:

A.

Category_sales = sales df.groupBy("category").agg(sum("sales amount") .alias ("total sales amount"))


B.

Category_sales = sales_df.sum("3ales_amount"). g-1- upBy("categcryn).alias("toLal_sales_amount))


C.

Category_sale: .es df -agg (sum ("sales amount") .-;r*i:rRy ("category") .alias ("total sa.en amount"))


D.

Category_sales = sales_df.groupBy("reqion"). agq(sum("sales_amountn).alias(ntotal_sales_amount''))


Expert Solution
Questions # 5:

A data engineer is writing a script that is meant to ingest new data from cloud storage. In the event of the Schema change, the ingestion should fail. It should fail until the changes downstream source can be found and verified as intended changes.

Which command will meet the requirements?

Options:

A.

addNewColumns


B.

failOnNewColumns


C.

rescue


D.

none


Expert Solution
Questions # 6:

A data engineer needs to process SQL queries on a large dataset with fluctuating workloads. The workload requires automatic scaling based on the volume of queries, without the need to manage or provision infrastructure. The solution should be cost-efficient and charge only for the compute resources used during query execution.

Which compute option should the data engineer use?

Options:

A.

Databricks SQL Analytics


B.

Databricks Jobs


C.

Databricks Runtime for ML


D.

Serverless SQL Warehouse


Expert Solution
Questions # 7:

A Data Engineer is building a simple data pipeline using Delta Live Tables (DLT) in Databricksto ingest customer data. The raw customer data is stored in a cloud storage location in JSON format. The task is to create a DLT pipeline that reads the rawJSON data and writes it into a Delta table for further processing.

Which code snippet will correctly ingest the raw JSON data and create a Delta table using DLT?

A)

Question # 7

B)

Question # 7

C)

Question # 7

D)

Question # 7

Options:

A.

Option A


B.

Option B


C.

Option C


D.

Option D


Expert Solution
Questions # 8:

A data engineer needs to determine whether to use the built-in Databricks Notebooks versioning or version their project using Databricks Repos.

Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks versioning?

Options:

A.

Databricks Repos automatically saves development progress


B.

Databricks Repos supports the use of multiple branches


C.

Databricks Repos allows users to revert to previous versions of a notebook


D.

Databricks Repos provides the ability to comment on specific changes


E.

Databricks Repos is wholly housed within the Databricks Lakehouse Platform


Expert Solution
Questions # 9:

A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to a data analytics dashboard for a retail use case. The job has a Databricks SQL query that returns the number of store-level records where sales is equal to zero. The data engineer wants their entire team to be notified via a messaging webhook whenever this value is greater than 0.

Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of stores with $0 in sales is greater than zero?

Options:

A.

They can set up an Alert with a custom template.


B.

They can set up an Alert with a new email alert destination.


C.

They can set up an Alert with one-time notifications.


D.

They can set up an Alert with a new webhook alert destination.


E.

They can set up an Alert without notifications.


Expert Solution
Questions # 10:

Question # 10

Calculate the total sales amount for each region and store the results in a new dataframe called region_sales.

Given the expected result:

Question # 10

Which code will generate the expected result?

Options:

A.

region_sales = sales_df.groupBy("region").agg(sum("sales_amountM).alias("total_sales_amount"))


B.

region_sales = sales_df. sum ("salen_aiTiount") . groupBy ("region") .alias ("total_sale3_amount")


C.

region_sales= sales_df.groupBy("category").sum(nsales_amount").alias("t_otal_sales_amounl")


D.

region sales - sales_df.agg(sum("sales_amount").groupBy("region").alias("total sales amount"))


Expert Solution
Viewing page 1 out of 5 pages
Viewing questions 1-10 out of questions