Pre-Winter Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: pass65

Pass the Google Google Cloud Certified Professional-Data-Engineer Questions and answers with CertsForce

Viewing page 3 out of 6 pages
Viewing questions 21-30 out of questions
Questions # 21:

You maintain ETL pipelines. You notice that a streaming pipeline running on Dataflow is taking a long time to process incoming data, which causes output delays. You also noticed that the pipeline graph was automatically optimized by Dataflow and merged into one step. You want to identify where the potential bottleneck is occurring. What should you do?

Options:

A.

Insert a Reshuffle operation after each processing step, and monitor the execution details in the Dataflow console.


B.

Log debug information in each ParDo function, and analyze the logs at execution time.


C.

Insert output sinks after each key processing step, and observe the writing throughput of each block.


D.

Verify that the Dataflow service accounts have appropriate permissions to write the processed data to the output sinks


Expert Solution
Questions # 22:

You migrated your on-premises Apache Hadoop Distributed File System (HDFS) data lake to Cloud Storage. The data scientist team needs to process the data by using Apache Spark and SQL. Security policies need to be enforced at the column level. You need a cost-effective solution that can scale into a data mesh. What should you do?

Options:

A.

1. Load the data to BigQuery tables.2. Create a taxonomy of policy tags in Data Catalog.3. Add policy tags to columns.4. Process with the Spark-BigQuery connector or BigQuery SQL.


B.

1. Deploy a long-living Dataproc cluster with Apache Hive and Ranger enabled.2. Configure Ranger for column level security.3. Process with Dataproc Spark or Hive SQL.


C.

1. Apply an Identity and Access Management (IAM) policy at the file level in Cloud Storage.2. Define a BigQuery external table for SQL processing.3. Use Dataproc Spark to process the Cloud Storage files.


D.

1. Define a BigLake table.2. Create a taxonomy of policy tags in Data Catalog.3. Add policy tags to columns.4. Process with the Spark-BigQuery connector or BigQuery SQL.


Expert Solution
Questions # 23:

You need to create a near real-time inventory dashboard that reads the main inventory tables in your BigQuery data warehouse. Historical inventory data is stored as inventory balances by item and location. You have several thousand updates to inventory every hour. You want to maximize performance of the dashboard and ensure that the data is accurate. What should you do?

Options:

A.

Leverage BigQuery UPDATE statements to update the inventory balances as they are changing.


B.

Partition the inventory balance table by item to reduce the amount of data scanned with each inventory update.


C.

Use the BigQuery streaming the stream changes into a daily inventory movement table. Calculate balances in a view that joins it to the historical inventory balance table. Update the inventory balance table nightly.


D.

Use the BigQuery bulk loader to batch load inventory changes into a daily inventory movement table. Calculate balances in a view that joins it to the historical inventory balance table. Update the inventory balance table nightly.


Expert Solution
Questions # 24:

You plan to deploy Cloud SQL using MySQL. You need to ensure high availability in the event of a zone failure. What should you do?

Options:

A.

Create a Cloud SQL instance in one zone, and create a failover replica in another zone within the same region.


B.

Create a Cloud SQL instance in one zone, and create a read replica in another zone within the same region.


C.

Create a Cloud SQL instance in one zone, and configure an external read replica in a zone in a different region.


D.

Create a Cloud SQL instance in a region, and configure automatic backup to a Cloud Storage bucket in the same region.


Expert Solution
Questions # 25:

Your team is building a data lake platform on Google Cloud. As a part of the data foundation design, you are planning to store all the raw data in Cloud Storage You are expecting to ingest approximately 25 GB of data a day and your billing department is worried about the increasing cost of storing old data. The current business requirements are:

• The old data can be deleted anytime

• You plan to use the visualization layer for current and historical reporting

• The old data should be available instantly when accessed

• There should not be any charges for data retrieval.

What should you do to optimize for cost?

Options:

A.

Create the bucket with the Autoclass storage class feature.


B.

Create an Object Lifecycle Management policy to modify the storage class for data older than 30 days to nearline, 90 days to coldline. and 365 days to archive storage class. Delete old data as needed.


C.

Create an Object Lifecycle Management policy to modify the storage class for data older than 30 days to coldline, 90 days to nearline. and 365 days to archive storage class Delete old data as needed.


D.

Create an Object Lifecycle Management policy to modify the storage class for data older than 30 days to nearlme. 45 days to coldline. and 60 days to archive storage class Delete old data as needed.


Expert Solution
Questions # 26:

You are implementing a chatbot to help an online retailer streamline their customer service. The chatbot must be able to respond to both text and voice inquiries. You are looking for a low-code or no-code option, and you want to be able to easily train the chatbot to provide answers to keywords. What should you do?

Options:

A.

Use the Speech-to-Text API to build a Python application in App Engine.


B.

Use the Speech-to-Text API to build a Python application in a Compute Engine instance.


C.

Use Dialogflow for simple queries and the Speech-to-Text API for complex queries.


D.

Use Dialogflow to implement the chatbot. defining the intents based on the most common queries collected.


Expert Solution
Questions # 27:

You want to optimize your queries for cost and performance. How should you structure your data?

Options:

A.

Partition table data by create_date, location_id and device_version


B.

Partition table data by create_date cluster table data by location_Id and device_version


C.

Cluster table data by create_date location_id and device_version


D.

Cluster table data by create_date partition by locationed and device_version


Expert Solution
Questions # 28:

You have a data analyst team member who needs to analyze data by using BigQuery. The data analyst wants to create a data pipeline that would load 200 CSV files with an average size of 15MB from a Cloud Storage bucket into BigQuery daily. The data needs to be ingested and transformed before being accessed in BigQuery for analysis. You need to recommend a fully managed, no-code solution for the data analyst. What should you do?

Options:

A.

Create a Cloud Run function and schedule it to run daily using Cloud Scheduler to load the data into BigQuery.


B.

Use the BigQuery Data Transfer Service to load files from Cloud Storage to BigQuery, create a BigQuery job which transforms the data using BigQuery SQL and schedule it to run daily.


C.

Build a custom Apache Beam pipeline and run it on Dataflow to load the file from Cloud Storage to BigQuery and schedule it to run daily using Cloud Composer.


D.

Create a pipeline by using BigQuery pipelines and schedule it to load the data into BigQuery daily.


Expert Solution
Questions # 29:

Each analytics team in your organization is running BigQuery jobs in their own projects. You want to enable each team to monitor slot usage within their projects. What should you do?

Options:

A.

Create a Stackdriver Monitoring dashboard based on the BigQuery metric query/scanned_bytes


B.

Create a Stackdriver Monitoring dashboard based on the BigQuery metric slots/allocated_for_project


C.

Create a log export for each project, capture the BigQuery job execution logs, create a custom metric based on the totalSlotMs, and create a Stackdriver Monitoring dashboard based on the custom metric


D.

Create an aggregated log export at the organization level, capture the BigQuery job execution logs, create a custom metric based on the totalSlotMs, and create a Stackdriver Monitoring dashboard based on the custom metric


Expert Solution
Questions # 30:

MJTelco’s Google Cloud Dataflow pipeline is now ready to start receiving data from the 50,000 installations. You want to allow Cloud Dataflow to scale its compute power up as required. Which Cloud Dataflow pipeline configuration setting should you update?

Options:

A.

The zone


B.

The number of workers


C.

The disk size per worker


D.

The maximum number of workers


Expert Solution
Viewing page 3 out of 6 pages
Viewing questions 21-30 out of questions