Explanation
transactionsDf.withColumn("predErrorSqrt", sqrt(col("predError")))
Correct. The DataFrame.withColumn() operator is used to add a new column to a DataFrame. It takes two arguments: The name of the new column (here: predErrorSqrt) and a Column expression
as the new column. In PySpark, a Column expression means referring to a column using the col("predError") command or by other means, for example by transactionsDf.predError, or even just
using the column name as a string, "predError".
The QUESTION NO: asks for the square root. sqrt() is a function in pyspark.sql.functions and calculates the square root. It takes a value or a Column as an input. Here it is the predError column of
DataFrame transactionsDf expressed through col("predError").
transactionsDf.withColumn("predErrorSqrt", sqrt(predError))
Incorrect. In this expression, sqrt(predError) is incorrect syntax. You cannot refer to predError in this way – to Spark it looks as if you are trying to refer to the non-existent Python variable predError.
You could pass transactionsDf.predError, col("predError") (as in the correct solution), or even just "predError" instead.
transactionsDf.select(sqrt(predError))
Wrong. Here, the explanation just above this one about how to refer to predError applies.
transactionsDf.select(sqrt("predError"))
No. While this is correct syntax, it will return a single-column DataFrame only containing a column showing the square root of column predError. However, the QUESTION NO: asks for a column to
be added to the original DataFrame transactionsDf.
transactionsDf.withColumn("predErrorSqrt", col("predError").sqrt())
No. The issue with this statement is that column col("predError") has no sqrt() method. sqrt() is a member of pyspark.sql.functions, but not of pyspark.sql.Column.
More info: pyspark.sql.DataFrame.withColumn — PySpark 3.1.2 documentation and pyspark.sql.functions.sqrt — PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 2, QUESTION NO: 31 (Databricks import instructions)
Submit