Comprehensive and Detailed Explanation From Exact Extract:
The.coalesce(numPartitions)function is used to reduce the number of partitions in a DataFrame. It does not increase the number of partitions. If the specified number of partitions is greater than the current number, it will not have any effect.
From the official Spark documentation:
“coalesce() results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim one or more of the current partitions.”
However, if you try to increase partitions using coalesce (e.g., from 10 to 20), the number of partitions remains unchanged.
Hence,df.coalesce(20)will still return a DataFrame with 10 partitions.
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit