Databricks Certified Associate Developer for Apache Spark 3.5-Python Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question # 3 Topic 1 Discussion

Databricks Certified Associate Developer for Apache Spark 3.5-Python Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question # 3 Topic 1 Discussion

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Exam Topic 1 Question 3 Discussion:
Question #: 3
Topic #: 1

A data engineer is working with a large JSON dataset containing order information. The dataset is stored in a distributed file system and needs to be loaded into a Spark DataFrame for analysis. The data engineer wants to ensure that the schema is correctly defined and that the data is read efficiently.

Which approach should the data scientist use to efficiently load the JSON data into a Spark DataFrame with a predefined schema?


A.

Use spark.read.json() to load the data, then use DataFrame.printSchema() to view the inferred schema, and finally use DataFrame.cast() to modify column types.


B.

Use spark.read.json() with the inferSchema option set to true


C.

Use spark.read.format("json").load() and then use DataFrame.withColumn() to cast each column to the desired data type.


D.

Define a StructType schema and use spark.read.schema(predefinedSchema).json() to load the data.


Get Premium Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Questions

Contribute your Thoughts:


Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.