Explanation
Correct code block:
transactionsDf.repartition(14, "storeId", "transactionDate").count()
Since we do not know how many partitions DataFrame transactionsDf has, we cannot safely use coalesce, since it would not make any change if the current number of partitions is smaller than 14.
So, we need to use repartition.
In the Spark documentation, the call structure for repartition is shown like this: DataFrame.repartition(numPartitions, *cols). The * operator means that any argument after numPartitions will be
interpreted as column. Therefore, the brackets need to be removed.
Finally, the QUESTION NO: specifies that after the execution the DataFrame should be divided. So, indirectly this QUESTION NO: is asking us to append an action to the code block. Since .select()
is a transformation. the only possible choice here is .count().
More info: pyspark.sql.DataFrame.repartition — PySpark 3.1.1 documentation
Static notebook | Dynamic notebook: See test 1, QUESTION NO: 40 (Databricks import instructions)
Submit