A Spark developer is building an app to monitor task performance. They need to track the maximum task processing time per worker node and consolidate it on the driver for analysis.
Which technique should be used?
A.
Use an RDD action like reduce() to compute the maximum time
B.
Use an accumulator to record the maximum time on the driver
C.
Broadcast a variable to share the maximum time among workers
D.
Configure the Spark UI to automatically collect maximum times
The correct way to aggregate information (e.g., max value) from distributed workers back to the driver is using RDD actions such as reduce() or aggregate().
From the documentation:
“To perform global aggregations on distributed data, actions like reduce() are commonly used to collect summaries such as min/max/avg.”
Accumulators (Option B) do not support max operations directly and are not intended for such analytics.
Broadcast (Option C) is used to send data to workers, not collect from them.
Spark UI (Option D) is a monitoring tool — not an analytics collection interface.
Final Answer: A
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit