Explanation
transactionsDf.where(transactionsDf.storeId!=25)
Correct. DataFrame.where() is an alias for the DataFrame.filter() method. Using this method, it is straightforward to filter out rows that do not have value 25 in column storeId.
transactionsDf.select(transactionsDf.storeId!=25)
Wrong. The select operator allows you to build DataFrames column-wise, but when using it as shown, it does not filter out rows.
transactionsDf.filter(transactionsDf.storeId==25)
Incorrect. Although the filter expression works for filtering rows, the == in the filtering condition is inappropriate. It should be != instead.
transactionsDf.drop(transactionsDf.storeId==25)
No. DataFrame.drop() is used to remove specific columns, but not rows, from the DataFrame.
transactionsDf.remove(transactionsDf.storeId==25)
False. There is no DataFrame.remove() operator in PySpark.
More info: pyspark.sql.DataFrame.where — PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 3, QUESTION NO: 48 (Databricks import instructions)
Submit