Databricks Databricks-Certified-Data-Engineer-Associate Exam Questions Free Practice Test

Viewing page 4 out of 5 pages

Viewing questions 31-40 out of questions

Questions # 31:

An organization is looking for an optimized storage layer that supports ACID transactions and schema enforcement. Which technology should the organization use?

Options:

Cloud File Storage

Unity Catalog

Data lake

Delta Lake

Expert Solution

Questions # 32:

A data engineer has a Job with multiple tasks that runs nightly. Each of the tasks runs slowly because the clusters take a long time to start.

Which of the following actions can the data engineer perform to improve the start up time for the clusters used for the Job?

Options:

They can use endpoints available in Databricks SQL

They can use jobs clusters instead of all-purpose clusters

They can configure the clusters to be single-node

They can use clusters that are from a cluster pool

They can configure the clusters to autoscale for larger data sizes

Expert Solution

Answer

Explanation

The best action that the data engineer can perform to improve the start up time for the clusters used for the Job is to use clusters that are from a cluster pool. A cluster pool is a set of idle clusters that can be used by jobs or interactive sessions. By using a cluster pool, the data engineer can avoid the cluster creation time and reduce the latency of the tasks. Cluster pools also offer cost savings and resource efficiency, as they can be shared by multiple users and jobs.

Option A is not relevant, as endpoints available in Databricks SQL are used for creating and managing SQL analytics workloads, not for improving cluster start up time.

Option B is not correct, as jobs clusters and all-purpose clusters have similar start up times. Jobs clusters are clusters that are dedicated to run a single job and are terminated when the job is completed. All-purpose clusters are clusters that can be used for multiple purposes, such as interactive sessions, notebooks, or multiple jobs. Both types of clusters can benefit from using a cluster pool.

Option C is not advisable, as configuring the clusters to be single-node will reduce the parallelism and performance of the tasks. Single-node clusters are clusters that have only one worker node and are typically used for testing or development purposes. They are not suitable for running production jobs that require high scalability and fault tolerance.

Option E is not helpful, as configuring the clusters to autoscale for larger data sizes will not affect the start up time of the clusters. Autoscaling is a feature that allows clusters to dynamically adjust the number of worker nodes based on the workload. It can help optimize the resource utilization and cost efficiency of the clusters, but it does not speed up the cluster creation process.

[:, Cluster Pools, Jobs, Clusters, [Databricks Data Engineer Professional Exam Guide], ]

Questions # 33:

A new data engineering team has been assigned to work on a project. The team will need access to database customers in order to see what tables already exist. The team has its own group team.

Which of the following commands can be used to grant the necessary permission on the entire database to the new team?

Options:

GRANT VIEW ON CATALOG customers TO team;

GRANT CREATE ON DATABASE customers TO team;

GRANT USAGE ON CATALOG team TO customers;

GRANT CREATE ON DATABASE team TO customers;

GRANT USAGE ON DATABASE customers TO team;

Expert Solution

Questions # 34:

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which of the following commands can be used to grant full permissions on the database to the new data engineering team?

Options:

GRANT ALL PRIVILEGES ON TABLE sales TO team;

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

GRANT SELECT ON TABLE sales TO team;

GRANT USAGE ON TABLE sales TO team;

GRANT ALL PRIVILEGES ON TABLE team TO sales;

Expert Solution

Questions # 35:

A data engineering project involves processing large batches of data on a daily schedule using ETL. The jobs are resource-intensive and vary in size, requiring a scalable, cost-efficient compute solution that can automatically scale based on the workload.

Which compute approach will satisfy the needs described?

Options:

Databricks SQL Serverless

Dedicated Cluster

All-Purpose Cluster

Job Cluster

Expert Solution

Questions # 36:

A data engineer needs to create a table in Databricks using data from a CSV file at location /path/to/csv.

They run the following command:

Question # 36

Which of the following lines of code fills in the above blank to successfully complete the task?

Options:

None of these lines of code are needed to successfully complete the task

USING CSV

FROM CSV

USING DELTA

FROM "path/to/csv"

Expert Solution

Questions # 37:

What is the functionality of AutoLoader in Databricks?

Options:

Auto Loader automatically ingests and processes new files from cloud storage, handling batch data with support for schema evolution.

Auto Loader automatically ingests and processes new files from cloud storage, handling only streaming data with no support for schema evolution.

Auto Loader automatically ingests and processes new files from cloud storage, handling batch and streaming data with no support for schema evolution.

Auto Loader automatically ingests and processes new files from cloud storage, handling both batch and streaming data with support for schema evolution.

Expert Solution

Questions # 38:

A data organization leader is upset about the data analysis team’s reports being different from the data engineering team’s reports. The leader believes the siloed nature of their organization’s data engineering and data analysis architectures is to blame.

Which of the following describes how a data lakehouse could alleviate this issue?

Options:

Both teams would autoscale their work as data size evolves

Both teams would use the same source of truth for their work

Both teams would reorganize to report to the same department

Both teams would be able to collaborate on projects in real-time

Both teams would respond more quickly to ad-hoc requests

Expert Solution

Questions # 39:

A data engineer needs access to a table new_uable, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which approach can be used to identify the owner of new_table?

Options:

There is no way to identify the owner of the table

Review the Owner field in the table's page in the cloud storage solution

Review the Permissions tab in the table's page in Data Explorer

Review the Owner field in the table’s page in Data Explorer

Expert Solution

Questions # 40:

Identify a scenario to use an external table.

A Data Engineer needs to create a parquet bronze table and wants to ensure that it gets stored in a specific path in an external location.

Which table can be created in this scenario?

Options:

An external table where the location is pointing to specific path in external location.

An external table where the schema has managed location pointing to specific path in external location.

A managed table where the catalog has managed location pointing to specific path in external location.

A managed table where the location is pointing to specific path in external location.

Expert Solution

Viewing page 4 out of 5 pages

Viewing questions 31-40 out of questions

New Year Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the Databricks Databricks Certification Databricks-Certified-Data-Engineer-Associate Questions and answers with CertsForce