Big Halloween Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the Databricks Databricks Certification Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Questions and answers with CertsForce

Viewing page 4 out of 4 pages
Viewing questions 31-40 out of questions
Questions # 31:

A DataFrame df has columns name, age, and salary. The developer needs to sort the DataFrame by age in ascending order and salary in descending order.

Which code snippet meets the requirement of the developer?

Options:

A.

df.orderBy(col("age").asc(), col("salary").asc()).show()


B.

df.sort("age", "salary", ascending=[True, True]).show()


C.

df.sort("age", "salary", ascending=[False, True]).show()


D.

df.orderBy("age", "salary", ascending=[True, False]).show()


Expert Solution
Questions # 32:

Which Spark configuration controls the number of tasks that can run in parallel on the executor?

Options:

Options:

A.

spark.executor.cores


B.

spark.task.maxFailures


C.

spark.driver.cores


D.

spark.executor.memory


Expert Solution
Questions # 33:

A data scientist is working on a project that requires processing large amounts of structured data, performing SQL queries, and applying machine learning algorithms. The data scientist is considering using Apache Spark for this task.

Which combination of Apache Spark modules should the data scientist use in this scenario?

Options:

Options:

A.

Spark DataFrames, Structured Streaming, and GraphX


B.

Spark SQL, Pandas API on Spark, and Structured Streaming


C.

Spark Streaming, GraphX, and Pandas API on Spark


D.

Spark DataFrames, Spark SQL, and MLlib


Expert Solution
Questions # 34:

24 of 55.

Which code should be used to display the schema of the Parquet file stored in the location events.parquet?

Options:

A.

spark.sql("SELECT * FROM events.parquet").show()


B.

spark.read.format("parquet").load("events.parquet").show()


C.

spark.read.parquet("events.parquet").printSchema()


D.

spark.sql("SELECT schema FROM events.parquet").show()


Expert Solution
Questions # 35:

25 of 55.

A Data Analyst is working on employees_df and needs to add a new column where a 10% tax is calculated on the salary.

Additionally, the DataFrame contains the column age, which is not needed.

Which code fragment adds the tax column and removes the age column?

Options:

A.

employees_df = employees_df.withColumn("tax", col("salary") * 0.1).drop("age")


B.

employees_df = employees_df.withColumn("tax", lit(0.1)).drop("age")


C.

employees_df = employees_df.dropField("age").withColumn("tax", col("salary") * 0.1)


D.

employees_df = employees_df.withColumn("tax", col("salary") + 0.1).drop("age")


Expert Solution
Questions # 36:

45 of 55.

Which feature of Spark Connect should be considered when designing an application that plans to enable remote interaction with a Spark cluster?

Options:

A.

It is primarily used for data ingestion into Spark from external sources.


B.

It provides a way to run Spark applications remotely in any programming language.


C.

It can be used to interact with any remote cluster using the REST API.


D.

It allows for remote execution of Spark jobs.


Expert Solution
Questions # 37:

12 of 55.

A data scientist has been investigating user profile data to build features for their model. After some exploratory data analysis, the data scientist identified that some records in the user profiles contain NULL values in too many fields to be useful.

The schema of the user profile table looks like this:

user_id STRING,

username STRING,

date_of_birth DATE,

country STRING,

created_at TIMESTAMP

The data scientist decided that if any record contains a NULL value in any field, they want to remove that record from the output before further processing.

Which block of Spark code can be used to achieve these requirements?

Options:

A.

filtered_users = raw_users.na.drop("any")


B.

filtered_users = raw_users.na.drop("all")


C.

filtered_users = raw_users.dropna(how="any")


D.

filtered_users = raw_users.dropna(how="all")


Expert Solution
Questions # 38:

41 of 55.

A data engineer is working on the DataFrame df1 and wants the Name with the highest count to appear first (descending order by count), followed by the next highest, and so on.

The DataFrame has columns:

id | Name | count | timestamp

---------------------------------

1 | USA | 10

2 | India | 20

3 | England | 50

4 | India | 50

5 | France | 20

6 | India | 10

7 | USA | 30

8 | USA | 40

Which code fragment should the engineer use to sort the data in the Name and count columns?

Options:

A.

df1.orderBy(col("count").desc(), col("Name").asc())


B.

df1.sort("Name", "count")


C.

df1.orderBy("Name", "count")


D.

df1.orderBy(col("Name").desc(), col("count").asc())


Expert Solution
Questions # 39:

Which command overwrites an existing JSON file when writing a DataFrame?

Options:

A.

df.write.mode("overwrite").json("path/to/file")


B.

df.write.overwrite.json("path/to/file")


C.

df.write.json("path/to/file", overwrite=True)


D.

df.write.format("json").save("path/to/file", mode="overwrite")


Expert Solution
Questions # 40:

40 of 55.

A developer wants to refactor older Spark code to take advantage of built-in functions introduced in Spark 3.5.

The original code:

from pyspark.sql import functions as F

min_price = 110.50

result_df = prices_df.filter(F.col("price") > min_price).agg(F.count("*"))

Which code block should the developer use to refactor the code?

Options:

A.

result_df = prices_df.filter(F.col("price") > F.lit(min_price)).agg(F.count("*"))


B.

result_df = prices_df.where(F.lit("price") > min_price).groupBy().count()


C.

result_df = prices_df.withColumn("valid_price", when(col("price") > F.lit(min_price), True))


D.

result_df = prices_df.filter(F.lit(min_price) > F.col("price")).count()


Expert Solution
Viewing page 4 out of 4 pages
Viewing questions 31-40 out of questions