Databricks Certified Associate Developer for Apache Spark 3.5-Python Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question # 11 Topic 2 Discussion

Databricks Certified Associate Developer for Apache Spark 3.5-Python Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question # 11 Topic 2 Discussion

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Exam Topic 2 Question 11 Discussion:
Question #: 11
Topic #: 2

A data scientist is working on a large dataset in Apache Spark using PySpark. The data scientist has a DataFramedfwith columnsuser_id,product_id, andpurchase_amountand needs to perform some operations on this data efficiently.

Which sequence of operations results in transformations that require a shuffle followed by transformations that do not?


A.

df.filter(df.purchase_amount > 100).groupBy("user_id").sum("purchase_amount")


B.

df.withColumn("discount", df.purchase_amount * 0.1).select("discount")


C.

df.withColumn("purchase_date", current_date()).where("total_purchase > 50")


D.

df.groupBy("user_id").agg(sum("purchase_amount").alias("total_purchase")).repartition(10)


Get Premium Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Questions

Contribute your Thoughts:


Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.