Summer Certification Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: force70

Pass the Databricks Databricks Certification Databricks-Certified-Professional-Data-Engineer Questions and answers with CertsForce

Viewing page 2 out of 6 pages
Viewing questions 11-20 out of questions
Questions # 11:

A junior data engineer seeks to leverage Delta Lake ' s Change Data Feed functionality to create a Type 1 table representing all of the values that have ever been valid for all rows in a bronze table created with the property delta.enableChangeDataFeed = true . They plan to execute the following code as a daily job:

Question # 11

Which statement describes the execution and results of running the above query multiple times?

Options:

A.

Each time the job is executed, newly updated records will be merged into the target table, overwriting previous values with the same primary keys.


B.

Each time the job is executed, the entire available history of inserted or updated records will be appended to the target table, resulting in many duplicate entries.


C.

Each time the job is executed, the target table will be overwritten using the entire history of inserted or updated records, giving the desired result.


D.

Each time the job is executed, the differences between the original and current versions are calculated; this may result in duplicate entries for some records.


E.

Each time the job is executed, only those records that have been inserted or updated since the last execution will be appended to the target table giving the desired result.


Expert Solution
Questions # 12:

A junior data engineer is migrating a workload from a relational database system to the Databricks Lakehouse. The source system uses a star schema, leveraging foreign key constrains and multi-table inserts to validate records on write.

Which consideration will impact the decisions made by the engineer while migrating this workload?

Options:

A.

All Delta Lake transactions are ACID compliance against a single table, and Databricks does not enforce foreign key constraints.


B.

Databricks only allows foreign key constraints on hashed identifiers, which avoid collisions in highly-parallel writes.


C.

Foreign keys must reference a primary key field; multi-table inserts must leverage Delta Lake ' s upsert functionality.


D.

Committing to multiple tables simultaneously requires taking out multiple table locks and can lead to a state of deadlock.


Expert Solution
Questions # 13:

A nightly job ingests data into a Delta Lake table using the following code:

Question # 13

The next step in the pipeline requires a function that returns an object that can be used to manipulate new records that have not yet been processed to the next table in the pipeline.

Which code snippet completes this function definition?

def new_records():

Options:

A.

return spark.readStream.table( " bronze " )


B.

return spark.readStream.load( " bronze " )


C.

13


D.

return spark.read.option( " readChangeFeed " , " true " ).table ( " bronze " )


E.

13


Expert Solution
Questions # 14:

Which of the following is true of Delta Lake and the Lakehouse?

Options:

A.

Because Parquet compresses data row by row. strings will only be compressed when a character is repeated multiple times.


B.

Delta Lake automatically collects statistics on the first 32 columns of each table which are leveraged in data skipping based on query filters.


C.

Views in the Lakehouse maintain a valid cache of the most recent versions of source tables at all times.


D.

Primary and foreign key constraints can be leveraged to ensure duplicate values are never entered into a dimension table.


E.

Z-order can only be applied to numeric values stored in Delta Lake tables


Expert Solution
Questions # 15:

A data engineering team needs to implement a tagging system for their tables as part of an automated ETL process, and needs to apply tags programmatically to tables in Unity Catalog.

Which SQL command adds tags to a table programmatically?

Options:

A.

ALTER TABLE table_name SET TAGS ( ' key1 ' = ' value1 ' , ' key2 ' = ' value2 ' );


B.

APPLY TAGS ON table_name VALUES ( ' key1 ' = ' value1 ' , ' key2 ' = ' value2 ' );


C.

COMMENT ON TABLE table_name TAGS ( ' key1 ' = ' value1 ' , ' key2 ' = ' value2 ' );


D.

SET TAGS FOR table_name AS ( ' key1 ' = ' value1 ' , ' key2 ' = ' value2 ' );


Expert Solution
Questions # 16:

An upstream source writes Parquet data as hourly batches to directories named with the current date. A nightly batch job runs the following code to ingest all data from the previous day as indicated by the date variable:

Question # 16

Assume that the fields customer_id and order_id serve as a composite key to uniquely identify each order.

If the upstream system is known to occasionally produce duplicate entries for a single order hours apart, which statement is correct?

Options:

A.

Each write to the orders table will only contain unique records, and only those records without duplicates in the target table will be written.


B.

Each write to the orders table will only contain unique records, but newly written records may have duplicates already present in the target table.


C.

Each write to the orders table will only contain unique records; if existing records with the same key are present in the target table, these records will be overwritten.


D.

Each write to the orders table will only contain unique records; if existing records with the same key are present in the target table, the operation will tail.


E.

Each write to the orders table will run deduplication over the union of new and existing records, ensuring no duplicate records are present.


Expert Solution
Questions # 17:

A data team is implementing an append-only Delta Lake pipeline that processes both batch and streaming data . They want to ensure that schema changes in the source data are automatically incorporated without breaking the pipeline.

Which configuration should the team use when writing data to the Delta table?

Options:

A.

ignoreChanges = false


B.

mergeSchema = true


C.

overwriteSchema = true


D.

validateSchema = false


Expert Solution
Questions # 18:

The Databricks workspace administrator has configured interactive clusters for each of the data engineering groups. To control costs, clusters are set to terminate after 30 minutes of inactivity. Each user should be able to execute workloads against their assigned clusters at any time of the day.

Assuming users have been added to a workspace but not granted any permissions, which of the following describes the minimal permissions a user would need to start and attach to an already configured cluster.

Options:

A.

" Can Manage " privileges on the required cluster


B.

Workspace Admin privileges, cluster creation allowed. " Can Attach To " privileges on the required cluster


C.

Cluster creation allowed. " Can Attach To " privileges on the required cluster


D.

" Can Restart " privileges on the required cluster


E.

Cluster creation allowed. " Can Restart " privileges on the required cluster


Expert Solution
Questions # 19:

A team of data engineer are adding tables to a DLT pipeline that contain repetitive expectations for many of the same data quality checks.

One member of the team suggests reusing these data quality rules across all tables defined for this pipeline.

What approach would allow them to do this?

Options:

A.

Maintain data quality rules in a Delta table outside of this pipeline’s target schema, providing the schema name as a pipeline parameter.


B.

Use global Python variables to make expectations visible across DLT notebooks included in the same pipeline.


C.

Add data quality constraints to tables in this pipeline using an external job with access to pipeline configuration files.


D.

Maintain data quality rules in a separate Databricks notebook that each DLT notebook of file.


Expert Solution
Questions # 20:

The marketing team is looking to share data in an aggregate table with the sales organization, but the field names used by the teams do not match, and a number of marketing specific fields have not been approval for the sales org.

Which of the following solutions addresses the situation while emphasizing simplicity?

Options:

A.

Create a view on the marketing table selecting only these fields approved for the sales team alias the names of any fields that should be standardized to the sales naming conventions.


B.

Use a CTAS statement to create a derivative table from the marketing table configure a production jon to propagation changes.


C.

Add a parallel table write to the current production pipeline, updating a new sales table that varies as required from marketing table.


D.

Create a new table with the required schema and use Delta Lake ' s DEEP CLONE functionality to sync up changes committed to one table to the corresponding table.


Expert Solution
Viewing page 2 out of 6 pages
Viewing questions 11-20 out of questions