Explanation
Actions can trigger Adaptive Query Execution, while transformation cannot.
Correct. Adaptive Query Execution optimizes queries at runtime. Since transformations are evaluated lazily, Spark does not have any runtime information to optimize the query until an action is
called. If Adaptive Query Execution is enabled, Spark will then try to optimize the query based on the feedback it gathers while it is evaluating the query.
Actions can be queued for delayed execution, while transformations can only be processed immediately.
No, there is no such concept as "delayed execution" in Spark. Actions cannot be evaluated lazily, meaning that they are executed immediately.
Actions are evaluated lazily, while transformations are not evaluated lazily.
Incorrect, it is the other way around: Transformations are evaluated lazily and actions trigger their evaluation.
Actions generate RDDs, while transformations do not.
No. Transformations change the data and, since RDDs are immutable, generate new RDDs along the way. Actions produce outputs in Python and data types (integers, lists, text files,...) based on
the RDDs, but they do not generate them.
Here is a great tip on how to differentiate actions from transformations: If an operation returns a DataFrame, Dataset, or an RDD, it is a transformation. Otherwise, it is an action.
Actions do not send results to the driver, while transformations do.
No. Actions send results to the driver. Think about running DataFrame.count(). The result of this command will return a number to the driver. Transformations, however, do not send results back to
the driver. They produce RDDs that remain on the worker nodes.
More info: What is the difference between a transformation and an action in Apache Spark? | Bartosz Mikulski, How to Speed up SQL Queries with Adaptive Query Execution
Submit