Which of the following code blocks returns a DataFrame that is an inner join of DataFrame itemsDf and DataFrame transactionsDf, on columns itemId and productId, respectively and in which every
Filtering out distinct rows based on columns is achieved with the dropDuplicates method, not the distinct method which does not take any arguments.
The second argument of the join() method only accepts strings if they are column names. The SQL-like statement "itemsDf.itemId==transactionsDf.productId" is therefore invalid.
In addition, it is not necessary to specify how="inner", since the default join type for the join command is already inner.
More info: pyspark.sql.DataFrame.join — PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 2, QUESTION NO: 53 (Databricks import instructions)
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit