Pass the Databricks Databricks Certification Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Questions and answers with CertsForce

Viewing page 3 out of 3 pages
Viewing questions 21-30 out of questions
Questions # 21:

A Spark application is experiencing performance issues in client mode because the driver is resource-constrained.

How should this issue be resolved?

Options:

A.

Add more executor instances to the cluster


B.

Increase the driver memory on the client machine


C.

Switch the deployment mode to cluster mode


D.

Switch the deployment mode to local mode


Expert Solution
Questions # 22:

An engineer wants to join two DataFramesdf1anddf2on the respectiveemployee_idandemp_idcolumns:

df1:employee_id INT,name STRING

df2:emp_id INT,department STRING

The engineer uses:

result = df1.join(df2, df1.employee_id == df2.emp_id, how='inner')

What is the behaviour of the code snippet?

Options:

A.

The code fails to execute because the column names employee_id and emp_id do not match automatically


B.

The code fails to execute because it must use on='employee_id' to specify the join column explicitly


C.

The code fails to execute because PySpark does not support joining DataFrames with a different structure


D.

The code works as expected because the join condition explicitly matches employee_id from df1 with emp_id from df2


Expert Solution
Questions # 23:

A developer runs:

Question # 23

What is the result?

Options:

Options:

A.

It stores all data in a single Parquet file.


B.

It throws an error if there are null values in either partition column.


C.

It appends new partitions to an existing Parquet file.


D.

It creates separate directories for each unique combination of color and fruit.


Expert Solution
Questions # 24:

Which Spark configuration controls the number of tasks that can run in parallel on the executor?

Options:

Options:

A.

spark.executor.cores


B.

spark.task.maxFailures


C.

spark.driver.cores


D.

spark.executor.memory


Expert Solution
Questions # 25:

A data engineer needs to write a DataFramedfto a Parquet file, partitioned by the columncountry, and overwrite any existing data at the destination path.

Which code should the data engineer use to accomplish this task in Apache Spark?

Options:

A.

df.write.mode("overwrite").partitionBy("country").parquet("/data/output")


B.

df.write.mode("append").partitionBy("country").parquet("/data/output")


C.

df.write.mode("overwrite").parquet("/data/output")


D.

df.write.partitionBy("country").parquet("/data/output")


Expert Solution
Viewing page 3 out of 3 pages
Viewing questions 21-30 out of questions