New Year Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the Databricks Databricks Certification Databricks-Certified-Professional-Data-Engineer Questions and answers with CertsForce

Viewing page 2 out of 4 pages
Viewing questions 11-20 out of questions
Questions # 11:

The data architect has decided that once data has been ingested from external sources into the

Databricks Lakehouse, table access controls will be leveraged to manage permissions for all production tables and views.

The following logic was executed to grant privileges for interactive queries on a production database to the core engineering group.

GRANT USAGE ON DATABASE prod TO eng;

GRANT SELECT ON DATABASE prod TO eng;

Assuming these are the only privileges that have been granted to the eng group and that these users are not workspace administrators, which statement describes their privileges?

Options:

A.

Group members have full permissions on the prod database and can also assign permissions to other users or groups.


B.

Group members are able to list all tables in the prod database but are not able to see the results of any queries on those tables.


C.

Group members are able to query and modify all tables and views in the prod database, but cannot create new tables or views.


D.

Group members are able to query all tables and views in the prod database, but cannot create or edit anything in the database.


E.

Group members are able to create, query, and modify all tables and views in the prod database, but cannot define custom functions.


Expert Solution
Questions # 12:

Which statement describes the default execution mode for Databricks Auto Loader?

Options:

A.

New files are identified by listing the input directory; new files are incrementally and idempotently loaded into the target Delta Lake table.


B.

Cloud vendor-specific queue storage and notification services are configured to track newly arriving files; new files are incrementally and impotently into the target Delta Lake table.


C.

Webhook trigger Databricks job to run anytime new data arrives in a source directory; new data automatically merged into target tables using rules inferred from the data.


D.

New files are identified by listing the input directory; the target table is materialized by directory querying all valid files in the source directory.


Expert Solution
Questions # 13:

A data engineer has created a new cluster using shared access mode with default configurations. The data engineer needs to allow the development team access to view the driver logs if needed.

What are the minimal cluster permissions that allow the development team to accomplish this?

Options:

A.

CAN ATTACH TO


B.

CAN MANAGE


C.

CAN VIEW


D.

CAN RESTART


Expert Solution
Questions # 14:

The data engineering team maintains the following code:

Question # 14

Assuming that this code produces logically correct results and the data in the source tables has been de-duplicated and validated, which statement describes what will occur when this code is executed?

Options:

A.

A batch job will update the enriched_itemized_orders_by_account table, replacing only those rows that have different values than the current version of the table, using accountID as the primary key.


B.

The enriched_itemized_orders_by_account table will be overwritten using the current valid version of data in each of the three tables referenced in the join logic.


C.

An incremental job will leverage information in the state store to identify unjoined rows in the source tables and write these rows to the enriched_iteinized_orders_by_account table.


D.

An incremental job will detect if new rows have been written to any of the source tables; if new rows are detected, all results will be recalculated and used to overwrite the enriched_itemized_orders_by_account table.


E.

No computation will occur until enriched_itemized_orders_by_account is queried; upon query materialization, results will be calculated using the current valid version of data in each of the three tables referenced in the join logic.


Expert Solution
Questions # 15:

A platform engineer is creating catalogs and schemas for the development team to use.

The engineer has created an initial catalog, catalog_A, and initial schema, schema_A. The engineer has also granted USE CATALOG, USE

SCHEMA, and CREATE TABLE to the development team so that the engineer can begin populating the schema with new tables.

Despite being owner of the catalog and schema, the engineer noticed that they do not have access to the underlying tables in Schema_A.

What explains the engineer's lack of access to the underlying tables?

Options:

A.

The platform engineer needs to execute a REFRESH statement as the table permissions did not automatically update for owners.


B.

Users granted with USE CATALOG can modify the owner's permissions to downstream tables.


C.

The owner of the schema does not automatically have permission to tables within the schema, but can grant them to themselves at any point.


D.

Permissions explicitly given by the table creator are the only way the Platform Engineer could access the underlying tables in their

schema.


Expert Solution
Questions # 16:

Which Python variable contains a list of directories to be searched when trying to locate required modules?

Options:

A.

importlib.resource path


B.

,sys.path


C.

os-path


D.

pypi.path


E.

pylib.source


Expert Solution
Questions # 17:

The data engineer team has been tasked with configured connections to an external database that does not have a supported native connector with Databricks. The external database already has data security configured by group membership. These groups map directly to user group already created in Databricks that represent various teams within the company.

A new login credential has been created for each group in the external database. The Databricks Utilities Secrets module will be used to make these credentials available to Databricks users.

Assuming that all the credentials are configured correctly on the external database and group membership is properly configured on Databricks, which statement describes how teams can be granted the minimum necessary access to using these credentials?

Options:

A.

‘’Read’’ permissions should be set on a secret key mapped to those credentials that will be used by a given team.


B.

No additional configuration is necessary as long as all users are configured as administrators in the workspace where secrets have been added.


C.

“Read” permissions should be set on a secret scope containing only those credentials that will be used by a given team.


D.

“Manage” permission should be set on a secret scope containing only those credentials that will be used by a given team.


Expert Solution
Questions # 18:

A DLT pipeline includes the following streaming tables:

Raw_lot ingest raw device measurement data from a heart rate tracking device.

Bgm_stats incrementally computes user statistics based on BPM measurements from raw_lot.

How can the data engineer configure this pipeline to be able to retain manually deleted or updated records in the raw_iot table while recomputing the downstream table when a pipeline update is run?

Options:

A.

Set the skipChangeCommits flag to true on bpm_stats


B.

Set the SkipChangeCommits flag to true raw_lot


C.

Set the pipelines, reset, allowed property to false on bpm_stats


D.

Set the pipelines, reset, allowed property to false on raw_iot


Expert Solution
Questions # 19:

The data engineering team is migrating an enterprise system with thousands of tables and views into the Lakehouse. They plan to implement the target architecture using a series of bronze, silver, and gold tables. Bronze tables will almost exclusively be used by production data engineering workloads, while silver tables will be used to support both data engineering and machine learning workloads. Gold tables will largely serve business intelligence and reporting purposes. While personal identifying information (PII) exists in all tiers of data, pseudonymization and anonymization rules are in place for all data at the silver and gold levels.

The organization is interested in reducing security concerns while maximizing the ability to collaborate across diverse teams.

Which statement exemplifies best practices for implementing this system?

Options:

A.

Isolating tables in separate databases based on data quality tiers allows for easy permissions management through database ACLs and allows physical separation of default storage locations for managed tables.


B.

Because databases on Databricks are merely a logical construct, choices around database organization do not impact security or discoverability in the Lakehouse.


C.

Storinq all production tables in a single database provides a unified view of all data assets available throughout the Lakehouse, simplifying discoverability by granting all users view privileges on this database.


D.

Working in the default Databricks database provides the greatest security when working with managed tables, as these will be created in the DBFS root.


E.

Because all tables must live in the same storage containers used for the database they're created in, organizations should be prepared to create between dozens and thousands of databases depending on their data isolation requirements.


Expert Solution
Questions # 20:

The DevOps team has configured a production workload as a collection of notebooks scheduled to run daily using the Jobs UI. A new data engineering hire is onboarding to the team and has requested access to one of these notebooks to review the production logic.

What are the maximum notebook permissions that can be granted to the user without allowing accidental changes to production code or data?

Options:

A.

Can Manage


B.

Can Edit


C.

No permissions


D.

Can Read


E.

Can Run


Expert Solution
Viewing page 2 out of 4 pages
Viewing questions 11-20 out of questions