Pass the Google Google Cloud Certified Professional-Data-Engineer Questions and answers with CertsForce

Viewing page 3 out of 7 pages
Viewing questions 21-30 out of questions
Questions # 21:

You are developing an Apache Beam pipeline to extract data from a Cloud SQL instance by using JdbclO. You have two projects running in Google Cloud. The pipeline will be deployed and executed on Dataflow in Project A. The Cloud SQL instance is running jn Project B and does not have a public IP address. After deploying the pipeline, you noticed that the pipeline failed to extract data from the Cloud SQL instance due to connection failure. You verified that VPC Service Controls and shared VPC are not in use in these projects. You want to resolve this error while ensuring that the data does not go through the public internet. What should you do?

Options:

A.

Set up VPC Network Peering between Project A and Project B. Add a firewall rule to allow the peered subnet range to access all instances on the network.


B.

Turn off the external IP addresses on the Dataflow worker. Enable Cloud NAT in Project A.


C.

Set up VPC Network Peering between Project A and Project B. Create a Compute Engine instance without external IP address in Project B on the peered subnet to serve as a proxy server to the Cloud SQL database.


D.

Add the external IP addresses of the Dataflow worker as authorized networks in the Cloud SOL instance.


Expert Solution
Questions # 22:

You have a data pipeline with a Dataflow job that aggregates and writes time series metrics to Bigtable. You notice that data is slow to update in Bigtable. This data feeds a dashboard used by thousands of users across the organization. You need to support additional concurrent users and reduce the amount of time required to write the data. What should you do?

Choose 2 answers

Options:

A.

Configure your Dataflow pipeline to use local execution.


B.

Modify your Dataflow pipeline lo use the Flatten transform before writing to Bigtable.


C.

Modify your Dataflow pipeline to use the CoGrcupByKey transform before writing to Bigtable.


D.

Increase the maximum number of Dataflow workers by setting maxNumWorkers in PipelineOptions.


E.

Increase the number of nodes in the Bigtable cluster.


Expert Solution
Questions # 23:

Your startup has a web application that currently serves customers out of a single region in Asia. You are targeting funding that will allow your startup lo serve customers globally. Your current goal is to optimize for cost, and your post-funding goat is to optimize for global presence and performance. You must use a native JDBC driver. What should you do?

Options:

A.

Use Cloud Spanner to configure a single region instance initially. and then configure multi-region C oud Spanner instances after securing funding.


B.

Use a Cloud SQL for PostgreSQL highly available instance first, and 8»gtable with US. Europe, and Asiareplication alter securing funding


C.

Use a Cloud SQL for PostgreSQL zonal instance first and Bigtable with US. Europe, and Asia after securing funding.


D.

Use a Cloud SOL for PostgreSQL zonal instance first, and Cloud SOL for PostgreSQL with highly available configuration after securing funding.


Expert Solution
Questions # 24:

You are migrating your on-premises data warehouse to BigQuery. As part of the migration, you want to facilitate cross-team collaboration to get the most value out of the organization's data. You need to design an architecture that would allow teams within the organization to securely publish, discover, and subscribe to read-only data in a self-service manner. You need to minimize costs while also maximizing data freshness What should you do?

Options:

A.

Create authorized datasets to publish shared data in the subscribing team's project.


B.

Create a new dataset for sharing in each individual team's project. Grant the subscribing team the bigquery. dataViewer role on thedataset.


C.

Use BigQuery Data Transfer Service to copy datasets to a centralized BigQuery project for sharing.


D.

Use Analytics Hub to facilitate data sharing.


Expert Solution
Questions # 25:

Government regulations in your industry mandate that you have to maintain an auditable record of access to certain types of datA. Assuming that all expiring logs will be archived correctly, where should you store data that is subject to that mandate?

Options:

A.

Encrypted on Cloud Storage with user-supplied encryption keys. A separate decryption key will be given to each authorized user.


B.

In a BigQuery dataset that is viewable only by authorized personnel, with the Data Access log used toprovide the auditability.


C.

In Cloud SQL, with separate database user names to each user. The Cloud SQL Admin activity logs will be used to provide the auditability.


D.

In a bucket on Cloud Storage that is accessible only by an AppEngine service that collects user information and logs the access before providing a link to the bucket.


Expert Solution
Questions # 26:

Suppose you have a table that includes a nested column called "city" inside a column called "person", but when you try to submit the following query in BigQuery, it gives you an error.

SELECT person FROM `project1.example.table1` WHERE city = "London"

How would you correct the error?

Options:

A.

Add ", UNNEST(person)" before the WHERE clause.


B.

Change "person" to "person.city".


C.

Change "person" to "city.person".


D.

Add ", UNNEST(city)" before the WHERE clause.


Expert Solution
Questions # 27:

Which methods can be used to reduce the number of rows processed by BigQuery?

Options:

A.

Splitting tables into multiple tables; putting data in partitions


B.

Splitting tables into multiple tables; putting data in partitions; using the LIMIT clause


C.

Putting data in partitions; using the LIMIT clause


D.

Splitting tables into multiple tables; using the LIMIT clause


Expert Solution
Questions # 28:

To give a user read permission for only the first three columns of a table, which access control method would you use?

Options:

A.

Primitive role


B.

Predefined role


C.

Authorized view


D.

It's not possible to give access to only the first three columns of a table.


Expert Solution
Questions # 29:

The Dataflow SDKs have been recently transitioned into which Apache service?

Options:

A.

Apache Spark


B.

Apache Hadoop


C.

Apache Kafka


D.

Apache Beam


Expert Solution
Questions # 30:

Which of the following is NOT true about Dataflow pipelines?

Options:

A.

Dataflow pipelines are tied to Dataflow, and cannot be run on any other runner


B.

Dataflow pipelines can consume data from other Google Cloud services


C.

Dataflow pipelines can be programmed in Java


D.

Dataflow pipelines use a unified programming model, so can work both with streaming and batch data sources


Expert Solution
Viewing page 3 out of 7 pages
Viewing questions 21-30 out of questions