Databricks Certified Associate Developer for Apache Spark 3.5 – Python Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question # 26 Topic 3 Discussion

Databricks Certified Associate Developer for Apache Spark 3.5 – Python Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question # 26 Topic 3 Discussion

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Exam Topic 3 Question 26 Discussion:
Question #: 26
Topic #: 3

18 of 55.

An engineer has two DataFrames — df1 (small) and df2 (large). To optimize the join, the engineer uses a broadcast join:

from pyspark.sql.functions import broadcast

df_result = df2.join(broadcast(df1), on="id", how="inner")

What is the purpose of using broadcast() in this scenario?


A.

It increases the partition size for df1 and df2.


B.

It ensures that the join happens only when the id values are identical.


C.

It reduces the number of shuffle operations by replicating the smaller DataFrame to all nodes.


D.

It filters the id values before performing the join.


Get Premium Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Questions

Contribute your Thoughts:


Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.