Spring Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the Google Google Cloud Certified Professional-Data-Engineer Questions and answers with CertsForce

Viewing page 1 out of 8 pages
Viewing questions 1-10 out of questions
Questions # 1:

You want to use a BigQuery table as a data sink. In which writing mode(s) can you use BigQuery as a sink?

Options:

A.

Both batch and streaming


B.

BigQuery cannot be used as a sink


C.

Only batch


D.

Only streaming


Expert Solution
Questions # 2:

Which of the following is NOT a valid use case to select HDD (hard disk drives) as the storage for Google Cloud Bigtable?

Options:

A.

You expect to store at least 10 TB of data.


B.

You will mostly run batch workloads with scans and writes, rather than frequently executing random reads of a small number of rows.


C.

You need to integrate with Google BigQuery.


D.

You will not use the data to back a user-facing or latency-sensitive application.


Expert Solution
Questions # 3:

You are developing a software application using Google's Dataflow SDK, and want to use conditional, for loops and other complex programming structures to create a branching pipeline. Which component will be used for the data processing operation?

Options:

A.

PCollection


B.

Transform


C.

Pipeline


D.

Sink API


Expert Solution
Questions # 4:

Cloud Bigtable is Google's ______ Big Data database service.

Options:

A.

Relational


B.

mySQL


C.

NoSQL


D.

SQL Server


Expert Solution
Questions # 5:

Your company has data assets across multiple Cloud Storage buckets and BigQuery datasets containing raw and processed data. The requirement is to establish a unified data governance framework that allows for centralized metadata discovery, data quality monitoring, and consistent security policy application across these various data stores without physically moving or duplicating the data. You need to implement a solution to achieve this federated governance. What should you do?

Options:

A.

Deploy a centralized Cloud SQL database to store metadata extracted from BigQuery and Cloud Storage using custom scripts.

Integrate the database with Looker Studio for data discovery and visualization.

Implement a custom policy engine using Cloud Run functions triggered by changes in IAM policies to enforce consistent security across projects.


B.

Create a Looker Studio dashboard on BigQuery INFORMATION_SCHEMA views to visualize and monitor data quality.

Manage security using IAM policies at the project level, supplemented by BigQuery authorized views for granular access control.


C.

Export metadata out of Dataplex Universal Catalog by running a metadata export job.

Implement Dataproc Metastore to manage table schemas and Apache Hive metastore for metadata discovery.

Manage security using a combination of BigQuery row-level security and Cloud Storage policies.


D.

Use Dataplex to organize the BigQuery datasets and Cloud Storage buckets into lakes and zones.

Use Dataplex for automated metadata discovery, centralized security policy management, data profiling, and data quality tasks.


Expert Solution
Questions # 6:

You work for a shipping company that uses handheld scanners to read shipping labels. Your company has strict data privacy standards that require scanners to only transmit recipients’ personally identifiable information (PII) to analytics systems, which violates user privacy rules. You want to quickly build a scalable solution using cloud-native managed services to prevent exposure of PII to the analytics systems. What should you do?

Options:

A.

Create an authorized view in BigQuery to restrict access to tables with sensitive data.


B.

Install a third-party data validation tool on Compute Engine virtual machines to check the incoming data for sensitive information.


C.

Use Stackdriver logging to analyze the data passed through the total pipeline to identify transactions that may contain sensitive information.


D.

Build a Cloud Function that reads the topics and makes a call to the Cloud Data Loss Prevention API. Use the tagging and confidence levels to either pass or quarantine the data in a bucket for review.


Expert Solution
Questions # 7:

You want to migrate an Apache Spark 3 batch job from on-premises to Google Cloud. You need to minimally change the job so that the job reads from Cloud Storage and writes the result to BigQuery. Your job is optimized for Spark, where each executor has 8 vCPU and 16 GB memory, and you want to be able to choose similar settings. You want to minimize installation and management effort to run your job. What should you do?

Options:

A.

Execute the job in a new Dataproc cluster.


B.

Execute as a Dataproc Serverless job.


C.

Execute the job as part of a deployment in a new Google Kubernetes Engine cluster.


D.

Execute the job from a new Compute Engine VM.


Expert Solution
Questions # 8:

You have a variety of files in Cloud Storage that your data science team wants to use in their models Currently, users do not have a method to explore, cleanse, and validate the data in Cloud Storage. You are looking for a low code solution that can be used by your data science team to quickly cleanse and explore data within Cloud Storage. What should you do?

Options:

A.

Load the data into BigQuery and use SQL to transform the data as necessary Provide the data science team access to staging tables to explore the raw data.


B.

Provide the data science team access to Dataflow to create a pipeline to prepare and validate the raw data and load data into BigQuery for data exploration.


C.

Provide the data science team access to Dataprep to prepare, validate, and explore the data within Cloud Storage.


D.

Create an external table in BigQuery and use SQL to transform the data as necessary Provide the data science team access to the external tables to explore the raw data.


Expert Solution
Questions # 9:

You work for an advertising company, and you’ve developed a Spark ML model to predict click-through rates at advertisement blocks. You’ve been developing everything at your on-premises data center, and now your company is migrating to Google Cloud. Your data center will be migrated to BigQuery. You periodically retrain your Spark ML models, so you need to migrate existing training pipelines to Google Cloud. What should you do?

Options:

A.

Use Cloud ML Engine for training existing Spark ML models


B.

Rewrite your models on TensorFlow, and start using Cloud ML Engine


C.

Use Cloud Dataproc for training existing Spark ML models, but start reading data directly from BigQuery


D.

Spin up a Spark cluster on Compute Engine, and train Spark ML models on the data exported from BigQuery


Expert Solution
Questions # 10:

You have terabytes of customer behavioral data streaming from Google Analytics into BigQuery daily Your customers' information, such as their preferences, is hosted on a Cloud SQL for MySQL database Your CRM database is hosted on a Cloud SQL for PostgreSQL instance. The marketing team wants to use your customers' information from the two databases and the customer behavioral data to create marketing campaigns for yearly active customers. You need to ensure that the marketing team can run the campaigns over 100 times a day on typical days and up to 300 during sales. At the same time you want to keep the load on the Cloud SQL databases to a minimum. What should you do?

Options:

A.

Create BigQuery connections to both Cloud SQL databases Use BigQuery federated queries on the two databases and the Google Analytics data on BigQuery to run these queries.


B.

Create streams in Datastream to replicate the required tables from both Cloud SQL databases to BigQuery for these queries.


C.

Create a Dataproc cluster with Trino to establish connections to both Cloud SQL databases and BigQuery, to execute the queries.


D.

Create a job on Apache Spark with Dataproc Serverless to query both Cloud SQL databases and the Google Analytics data on BigQuery for these queries.


Expert Solution
Viewing page 1 out of 8 pages
Viewing questions 1-10 out of questions