Databricks Certified Associate Developer for Apache Spark 3.5-Python Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question # 10 Topic 2 Discussion

Databricks Certified Associate Developer for Apache Spark 3.5-Python Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question # 10 Topic 2 Discussion

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Exam Topic 2 Question 10 Discussion:
Question #: 10
Topic #: 2

A data engineer is running a Spark job to process a dataset of 1 TB stored in distributed storage. The cluster has 10 nodes, each with 16 CPUs. Spark UI shows:

Low number of Active Tasks

Many tasks complete in milliseconds

Fewer tasks than available CPUs

Which approach should be used to adjust the partitioning for optimal resource allocation?


A.

Set the number of partitions equal to the total number of CPUs in the cluster


B.

Set the number of partitions to a fixed value, such as 200


C.

Set the number of partitions equal to the number of nodes in the cluster


D.

Set the number of partitions by dividing the dataset size (1 TB) by a reasonable partition size, such as 128 MB


Get Premium Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Questions

Contribute your Thoughts:


Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.