Comprehensive and Detailed Explanation From Exact Extract:
The correct way to load specific columns from an ORC file is to first load the file using.load()and then apply.select()on the resulting DataFrame. This is valid with.read.format("orc")or the shortcut.read.orc().
Aperforms selection after filtering, but doesn’t match the intention to minimize memory at load.
Bincorrectly tries to use.select()before.load(), which is invalid.
Cuses a non-existent.selected()method.
Dcorrectly loads and then selects.
[Reference:Apache Spark SQL API - ORC Format, ]
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit