Pass the Databricks Databricks Certification Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions and answers with CertsForce

Viewing page 5 out of 6 pages
Viewing questions 41-50 out of questions
Questions # 41:

Which of the following code blocks reads in the JSON file stored at filePath as a DataFrame?

Options:

A.

spark.read.json(filePath)


B.

spark.read.path(filePath, source="json")


C.

spark.read().path(filePath)


D.

spark.read().json(filePath)


E.

spark.read.path(filePath)


Expert Solution
Questions # 42:

The code block displayed below contains at least one error. The code block should return a DataFrame with only one column, result. That column should include all values in column value from

DataFrame transactionsDf raised to the power of 5, and a null value for rows in which there is no value in column value. Find the error(s).

Code block:

1.from pyspark.sql.functions import udf

2.from pyspark.sql import types as T

3.

4.transactionsDf.createOrReplaceTempView('transactions')

5.

6.def pow_5(x):

7. return x**5

8.

9.spark.udf.register(pow_5, 'power_5_udf', T.LongType())

10.spark.sql('SELECT power_5_udf(value) FROM transactions')

Options:

A.

The pow_5 method is unable to handle empty values in column value and the name of the column in the returned DataFrame is not result.


B.

The returned DataFrame includes multiple columns instead of just one column.


C.

The pow_5 method is unable to handle empty values in column value, the name of the column in the returned DataFrame is not result, and the SparkSession cannot access the transactionsDf

DataFrame.


D.

The pow_5 method is unable to handle empty values in column value, the name of the column in the returned DataFrame is not result, and Spark driver does not call the UDF function

appropriately.


E.

The pow_5 method is unable to handle empty values in column value, the UDF function is not registered properly with the Spark driver, and the name of the column in the returned DataFrame is

not result.


Expert Solution
Questions # 43:

Which of the following is not a feature of Adaptive Query Execution?

Options:

A.

Replace a sort merge join with a broadcast join, where appropriate.


B.

Coalesce partitions to accelerate data processing.


C.

Split skewed partitions into smaller partitions to avoid differences in partition processing time.


D.

Reroute a query in case of an executor failure.


E.

Collect runtime statistics during query execution.


Expert Solution
Questions # 44:

The code block displayed below contains an error. The code block should merge the rows of DataFrames transactionsDfMonday and transactionsDfTuesday into a new DataFrame, matching

column names and inserting null values where column names do not appear in both DataFrames. Find the error.

Sample of DataFrame transactionsDfMonday:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 5| null| null| null| 2|null|

5.| 6| 3| 2| 25| 2|null|

6.+-------------+---------+-----+-------+---------+----+

Sample of DataFrame transactionsDfTuesday:

1.+-------+-------------+---------+-----+

2.|storeId|transactionId|productId|value|

3.+-------+-------------+---------+-----+

4.| 25| 1| 1| 4|

5.| 2| 2| 2| 7|

6.| 3| 4| 2| null|

7.| null| 5| 2| null|

8.+-------+-------------+---------+-----+

Code block:

sc.union([transactionsDfMonday, transactionsDfTuesday])

Options:

A.

The DataFrames' RDDs need to be passed into the sc.union method instead of the DataFrame variable names.


B.

Instead of union, the concat method should be used, making sure to not use its default arguments.


C.

Instead of the Spark context, transactionDfMonday should be called with the join method instead of the union method, making sure to use its default arguments.


D.

Instead of the Spark context, transactionDfMonday should be called with the union method.


E.

Instead of the Spark context, transactionDfMonday should be called with the unionByName method instead of the union method, making sure to not use its default arguments.


Expert Solution
Questions # 45:

Which of the following is the idea behind dynamic partition pruning in Spark?

Options:

A.

Dynamic partition pruning is intended to skip over the data you do not need in the results of a query.


B.

Dynamic partition pruning concatenates columns of similar data types to optimize join performance.


C.

Dynamic partition pruning performs wide transformations on disk instead of in memory.


D.

Dynamic partition pruning reoptimizes physical plans based on data types and broadcast variables.


E.

Dynamic partition pruning reoptimizes query plans based on runtime statistics collected during query execution.


Expert Solution
Questions # 46:

Which of the following describes how Spark achieves fault tolerance?

Options:

A.

Spark helps fast recovery of data in case of a worker fault by providing the MEMORY_AND_DISK storage level option.


B.

If an executor on a worker node fails while calculating an RDD, that RDD can be recomputed by another executor using the lineage.


C.

Spark builds a fault-tolerant layer on top of the legacy RDD data system, which by itself is not fault tolerant.


D.

Due to the mutability of DataFrames after transformations, Spark reproduces them using observed lineage in case of worker node failure.


E.

Spark is only fault-tolerant if this feature is specifically enabled via the spark.fault_recovery.enabled property.


Expert Solution
Questions # 47:

Which of the following code blocks displays various aggregated statistics of all columns in DataFrame transactionsDf, including the standard deviation and minimum of values in each column?

Options:

A.

transactionsDf.summary()


B.

transactionsDf.agg("count", "mean", "stddev", "25%", "50%", "75%", "min")


C.

transactionsDf.summary("count", "mean", "stddev", "25%", "50%", "75%", "max").show()


D.

transactionsDf.agg("count", "mean", "stddev", "25%", "50%", "75%", "min").show()


E.

transactionsDf.summary().show()


Expert Solution
Questions # 48:

Which of the following code blocks adds a column predErrorSqrt to DataFrame transactionsDf that is the square root of column predError?

Options:

A.

transactionsDf.withColumn("predErrorSqrt", sqrt(predError))


B.

transactionsDf.select(sqrt(predError))


C.

transactionsDf.withColumn("predErrorSqrt", col("predError").sqrt())


D.

transactionsDf.withColumn("predErrorSqrt", sqrt(col("predError")))


E.

transactionsDf.select(sqrt("predError"))


Expert Solution
Questions # 49:

Which of the following code blocks applies the boolean-returning Python function evaluateTestSuccess to column storeId of DataFrame transactionsDf as a user-defined function?

Options:

A.

1.from pyspark.sql import types as T

2.evaluateTestSuccessUDF = udf(evaluateTestSuccess, T.BooleanType())

3.transactionsDf.withColumn("result", evaluateTestSuccessUDF(col("storeId")))


B.

1.evaluateTestSuccessUDF = udf(evaluateTestSuccess)

2.transactionsDf.withColumn("result", evaluateTestSuccessUDF(storeId))


C.

1.from pyspark.sql import types as T

2.evaluateTestSuccessUDF = udf(evaluateTestSuccess, T.IntegerType())

3.transactionsDf.withColumn("result", evaluateTestSuccess(col("storeId")))


D.

1.evaluateTestSuccessUDF = udf(evaluateTestSuccess)

2.transactionsDf.withColumn("result", evaluateTestSuccessUDF(col("storeId")))


E.

1.from pyspark.sql import types as T

2.evaluateTestSuccessUDF = udf(evaluateTestSuccess, T.BooleanType())

3.transactionsDf.withColumn("result", evaluateTestSuccess(col("storeId")))


Expert Solution
Questions # 50:

Which of the following statements about executors is correct?

Options:

A.

Executors are launched by the driver.


B.

Executors stop upon application completion by default.


C.

Each node hosts a single executor.


D.

Executors store data in memory only.


E.

An executor can serve multiple applications.


Expert Solution
Viewing page 5 out of 6 pages
Viewing questions 41-50 out of questions