Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: force70

Pass the Databricks Databricks Certification Databricks-Certified-Data-Engineer-Associate Questions and answers with CertsForce

Viewing page 2 out of 6 pages
Viewing questions 11-20 out of questions
Questions # 11:

A data engineer is setting up access control in Unity Catalog and needs to ensure that a group of data analysts can query tables but not modify data.

Which permission should the data engineer grant to the data analysts?

Options:

A.

SELECT


B.

INSERT


C.

MODIFY


D.

ALL PRIVILEGES


Expert Solution
Questions # 12:

A data engineer needs to determine whether to use the built-in Databricks Notebooks versioning or version their project using Databricks Repos.

Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks versioning?

Options:

A.

Databricks Repos automatically saves development progress


B.

Databricks Repos supports the use of multiple branches


C.

Databricks Repos allows users to revert to previous versions of a notebook


D.

Databricks Repos provides the ability to comment on specific changes


E.

Databricks Repos is wholly housed within the Databricks Lakehouse Platform


Expert Solution
Questions # 13:

A data engineer is inspecting an ETL pipeline based on a Pyspark job that consistently encounters performance bottlenecks. Based on developer feedback, the data engineer assumes the job is low on compute resources. To pinpoint the issue, the data engineer observes the Spark Ul and finds out the job has a high CPU time vs Task time.

Which course of action should the data engineer take?

Options:

A.

High CPU time vs Task time means an under-utilized cluster. The data engineer may need to repartition data to spread the jobs more evenly throughout the cluster.


B.

High CPU time vs Task time means efficient use of cluster and no change needed


C.

High CPU time vs Task time means over-utilized memory and the need to increase parallelism


D.

High CPU time vs Task time means a CPU over-utilized job. The data engineer may need to consider executor and core tuning or resizing the cluster


Expert Solution
Questions # 14:

A data engineer is developing a small proof of concept in a notebook. When running the entire notebook, cluster usage spikes. The data engineer wants to keep the development experience and get real-time results.

Which cluster meets these requirements?

Options:

A.

All-Purpose Cluster with a large fixed memory size


B.

All-Purpose Cluster with autoscaling


C.

Job Cluster with autoscaling enabled


D.

Job Cluster with Photon enabled and autoscaling


Expert Solution
Questions # 15:

A data engineer has a Job with multiple tasks that runs nightly. Each of the tasks runs slowly because the clusters take a long time to start.

Which of the following actions can the data engineer perform to improve the start up time for the clusters used for the Job?

Options:

A.

They can use endpoints available in Databricks SQL


B.

They can use jobs clusters instead of all-purpose clusters


C.

They can configure the clusters to be single-node


D.

They can use clusters that are from a cluster pool


E.

They can configure the clusters to autoscale for larger data sizes


Expert Solution
Questions # 16:

Which of the following data workloads will utilize a Gold table as its source?

Options:

A.

A job that enriches data by parsing its timestamps into a human-readable format


B.

A job that aggregates uncleaned data to create standard summary statistics


C.

A job that cleans data by removing malformatted records


D.

A job that queries aggregated data designed to feed into a dashboard


E.

A job that ingests raw data from a streaming source into the Lakehouse


Expert Solution
Questions # 17:

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which change will need to be made to the pipeline when migrating to Delta Live Tables?

Options:

A.

The pipeline can have different notebook sources in SQL & Python.


B.

The pipeline will need to be written entirely in SQL.


C.

The pipeline will need to be written entirely in Python.


D.

The pipeline will need to use a batch source in place of a streaming source.


Expert Solution
Questions # 18:

What is the functionality of AutoLoader in Databricks?

Options:

A.

Auto Loader automatically ingests and processes new files from cloud storage, handling batch data with support for schema evolution.


B.

Auto Loader automatically ingests and processes new files from cloud storage, handling only streaming data with no support for schema evolution.


C.

Auto Loader automatically ingests and processes new files from cloud storage, handling batch and streaming data with no support for schema evolution.


D.

Auto Loader automatically ingests and processes new files from cloud storage, handling both batch and streaming data with support for schema evolution.


Expert Solution
Questions # 19:

A data organization leader is upset about the data analysis team’s reports being different from the data engineering team’s reports. The leader believes the siloed nature of their organization’s data engineering and data analysis architectures is to blame.

Which of the following describes how a data lakehouse could alleviate this issue?

Options:

A.

Both teams would autoscale their work as data size evolves


B.

Both teams would use the same source of truth for their work


C.

Both teams would reorganize to report to the same department


D.

Both teams would be able to collaborate on projects in real-time


E.

Both teams would respond more quickly to ad-hoc requests


Expert Solution
Questions # 20:

A data engineer is standardizing repository layouts for multiple teams adopting Databricks Asset Bundles. The engineer wants to ensure every project has a single authoritative configuration file at the repository root that defines the bundle name, targets, workspace settings, permissions, and resource mappings (for jobs and pipelines).

Which strategy should the data engineer use to meet this goal?

Options:

A.

Place multiple databricks.yml files under each subfolder (for example, jobs/, pipelines/, workspace/) and merge them at deploy time using the include mapping.


B.

Place exactly one databricks.yml at the repository root; it is the main configuration file and may reference additional configuration files via the include mapping.


C.

Place a databricks.yml in a .databricks/ hidden folder at the repository root; only hidden locations are valid for bundle configs.


D.

Place a databricks.yml at the repository root and optional databricks.yml in subfolders; the CLI prefers .yaml over .yml when both exist.


Expert Solution
Viewing page 2 out of 6 pages
Viewing questions 11-20 out of questions