Pass the Google Google Cloud Certified Professional-Data-Engineer Questions and answers with CertsForce

Viewing page 6 out of 7 pages
Viewing questions 51-60 out of questions
Questions # 51:

Your software uses a simple JSON format for all messages. These messages are published to Google Cloud Pub/Sub, then processed with Google Cloud Dataflow to create a real-time dashboard for the CFO. During testing, you notice that some messages are missing in thedashboard. You check the logs, and all messages are being published to Cloud Pub/Sub successfully. What should you do next?

Options:

A.

Check the dashboard application to see if it is not displaying correctly.


B.

Run a fixed dataset through the Cloud Dataflow pipeline and analyze the output.


C.

Use Google Stackdriver Monitoring on Cloud Pub/Sub to find the missing messages.


D.

Switch Cloud Dataflow to pull messages from Cloud Pub/Sub instead of Cloud Pub/Sub pushing messages to Cloud Dataflow.


Expert Solution
Questions # 52:

Your weather app queries a database every 15 minutes to get the current temperature. The frontend is powered by Google App Engine and server millions of users. How should you design the frontend to respond to a database failure?

Options:

A.

Issue a command to restart the database servers.


B.

Retry the query with exponential backoff, up to a cap of 15 minutes.


C.

Retry the query every second until it comes back online to minimize staleness of data.


D.

Reduce the query frequency to once every hour until the database comes back online.


Expert Solution
Questions # 53:

You are working on a sensitive project involving private user data. You have set up a project on Google Cloud Platform to house your work internally. An external consultant is going to assist with coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should you maintain users’ privacy?

Options:

A.

Grant the consultant the Viewer role on the project.


B.

Grant the consultant the Cloud Dataflow Developer role on the project.


C.

Create a service account and allow the consultant to log on with it.


D.

Create an anonymized sample of the data for the consultant to work with in a different project.


Expert Solution
Questions # 54:

You are designing the database schema for a machine learning-based food ordering service that will predict what users want to eat. Here is some of the information you need to store:

The user profile: What the user likes and doesn’t like to eat

The user account information: Name, address, preferred meal times

The order information: When orders are made, from where, to whom

The database will be used to store all the transactional data of the product. You want to optimize the data schema. Which Google Cloud Platform product should you use?

Options:

A.

BigQuery


B.

Cloud SQL


C.

Cloud Bigtable


D.

Cloud Datastore


Expert Solution
Questions # 55:

You want to use a database of information about tissue samples to classify future tissue samples as either normal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissue samples. Which two characteristic support this method? (Choose two.)

Options:

A.

There are very few occurrences of mutations relative to normal samples.


B.

There are roughly equal occurrences of both normal and mutated samples in the database.


C.

You expect future mutations to have different features from the mutated samples in the database.


D.

You expect future mutations to have similar features to the mutated samples in the database.


E.

You already have labels for which samples are mutated and which are normal in the database.


Expert Solution
Questions # 56:

Your company has recently grown rapidly and now ingesting data at a significantly higher rate than it was previously. You manage the daily batch MapReduce analytics jobs in Apache Hadoop. However, the recent increase in data has meant the batch jobs are falling behind. You were asked to recommend ways the development team could increase the responsiveness of the analytics without increasing costs. What should you recommend they do?

Options:

A.

Rewrite the job in Pig.


B.

Rewrite the job in Apache Spark.


C.

Increase the size of the Hadoop cluster.


D.

Decrease the size of the Hadoop cluster but also rewrite the job in Hive.


Expert Solution
Questions # 57:

You are deploying a new storage system for your mobile application, which is a media streaming service. You decide the best fit is Google Cloud Datastore. You have entities with multiple properties, some of which can take on multiple values. For example, in the entity ‘Movie’ the property ‘actors’ and the property ‘tags’ have multiple values but the property ‘date released’ does not. A typical query would ask for all movies with actor=<actorname> ordered by date_released or all movies with tag=Comedy ordered by date_released. How should you avoid a combinatorial explosion in the number of indexes?

Question # 57

Options:

A.

Option A


B.

Option B.


C.

Option C


D.

Option D


Expert Solution
Questions # 58:

You are choosing a NoSQL database to handle telemetry data submitted from millions of Internet-of-Things (IoT) devices. The volume of data is growing at 100 TB per year, and each data entry has about 100 attributes. The data processing pipeline does not require atomicity, consistency, isolation, and durability (ACID). However, high availability and low latency are required.

You need to analyze the data by querying against individual fields. Which three databases meet your requirements? (Choose three.)

Options:

A.

Redis


B.

HBase


C.

MySQL


D.

MongoDB


E.

Cassandra


F.

HDFS with Hive


Expert Solution
Questions # 59:

Your company produces 20,000 files every hour. Each data file is formatted as a comma separated values (CSV) file that is less than 4 KB. All files must be ingested on Google Cloud Platform before they can be processed. Your company site has a 200 ms latency to Google Cloud, and your Internet connection bandwidth is limited as 50 Mbps. You currently deploy a secure FTP (SFTP) server on a virtual machine in Google Compute Engine as the data ingestion point. A local SFTP client runs on a dedicated machine to transmit the CSV files as is. The goal is to make reports with data from the previous day available to the executives by 10:00 a.m. each day. This design is barely able to keep up with the current volume, even though the bandwidth utilization is rather low.

You are told that due to seasonality, your company expects the number of files to double for the next three months. Which two actions should you take? (choose two.)

Options:

A.

Introduce data compression for each file to increase the rate file of file transfer.


B.

Contact your internet service provider (ISP) to increase your maximum bandwidth to at least 100 Mbps.


C.

Redesign the data ingestion process to use gsutil tool to send the CSV files to a storage bucket in parallel.


D.

Assemble 1,000 files into a tape archive (TAR) file. Transmit the TAR files instead, and disassemble the CSV files in the cloud upon receiving them.


E.

Create an S3-compatible storage endpoint in your network, and use Google Cloud Storage Transfer Service to transfer on-premices data to the designated storage bucket.


Expert Solution
Questions # 60:

You work for a manufacturing plant that batches application log files together into a single log file once a day at 2:00 AM. You have written a Google Cloud Dataflow job to process that log file. You need to make sure the log file in processed once per day as inexpensively as possible. What should you do?

Options:

A.

Change the processing job to use Google Cloud Dataproc instead.


B.

Manually start the Cloud Dataflow job each morning when you get into the office.


C.

Create a cron job with Google App Engine Cron Service to run the Cloud Dataflow job.


D.

Configure the Cloud Dataflow job as a streaming job so that it processes the log data immediately.


Expert Solution
Viewing page 6 out of 7 pages
Viewing questions 51-60 out of questions