Explanation
The executed physical plan depends on a cost optimization from a previous stage.
Correct! Spark considers multiple physical plans on which it performs a cost analysis and selects the final physical plan in accordance with the lowest-cost outcome of that analysis. That final
physical plan is then executed by Spark.
Spark uses the catalog to resolve the optimized logical plan.
No. Spark uses the catalog to resolve the unresolved logical plan, but not the optimized logical plan. Once the unresolved logical plan is resolved, it is then optimized using the Catalyst Optimizer.
The optimized logical plan is the input for physical planning.
The catalog assigns specific resources to the physical plan.
No. The catalog stores metadata, such as a list of names of columns, data types, functions, and databases. Spark consults the catalog for resolving the references in a logical plan at the beginning
of the conversion of the query into an execution plan. The result is then an optimized logical plan.
Depending on whether DataFrame API or SQL API are used, the physical plan may differ.
Wrong – the physical plan is independent of which API was used. And this is one of the great strengths of Spark!
The catalog assigns specific resources to the optimized memory plan.
There is no specific "memory plan" on the journey of a Spark computation.
More info: Spark’s Logical and Physical plans … When, Why, How and Beyond. | by Laurent Leturgez | datalex | Medium
Submit