Pass the Google Google Cloud Certified Professional-Data-Engineer Questions and answers with CertsForce

Viewing page 5 out of 6 pages
Viewing questions 41-50 out of questions
Questions # 41:

MJTelco is building a custom interface to share data. They have these requirements:

    They need to do aggregations over their petabyte-scale datasets.

    They need to scan specific time range rows with a very fast response time (milliseconds).

Which combination of Google Cloud Platform products should you recommend?

Options:

A.

Cloud Datastore and Cloud Bigtable


B.

Cloud Bigtable and Cloud SQL


C.

BigQuery and Cloud Bigtable


D.

BigQuery and Cloud Storage


Questions # 42:

MJTelco needs you to create a schema in Google Bigtable that will allow for the historical analysis of the last 2 years of records. Each record that comes in is sent every 15 minutes, and contains a unique identifier of the device and a data record. The most common query is for all the data for a given device for a given day. Which schema should you use?

Options:

A.

Rowkey: date#device_idColumn data: data_point


B.

Rowkey: dateColumn data: device_id, data_point


C.

Rowkey: device_idColumn data: date, data_point


D.

Rowkey: data_pointColumn data: device_id, date


E.

Rowkey: date#data_pointColumn data: device_id


Questions # 43:

You need to compose visualizations for operations teams with the following requirements:

Which approach meets the requirements?

Options:

A.

Load the data into Google Sheets, use formulas to calculate a metric, and use filters/sorting to show only suboptimal links in a table.


B.

Load the data into Google BigQuery tables, write Google Apps Script that queries the data, calculates the metric, and shows only suboptimal rows in a table in Google Sheets.


C.

Load the data into Google Cloud Datastore tables, write a Google App Engine Application that queries all rows, applies a function to derive the metric, and then renders results in a table using the Google charts and visualization API.


D.

Load the data into Google BigQuery tables, write a Google Data Studio 360 report that connects to your data, calculates a metric, and then uses a filter expression to show only suboptimal rows in a table.


Questions # 44:

MJTelco’s Google Cloud Dataflow pipeline is now ready to start receiving data from the 50,000 installations. You want to allow Cloud Dataflow to scale its compute power up as required. Which Cloud Dataflow pipeline configuration setting should you update?

Options:

A.

The zone


B.

The number of workers


C.

The disk size per worker


D.

The maximum number of workers


Questions # 45:

You create a new report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. It is company policy to ensure employees can view only the data associated with their region, so you create and populate a table for each region. You need to enforce the regional access policy to the data.

Which two actions should you take? (Choose two.)

Options:

A.

Ensure all the tables are included in global dataset.


B.

Ensure each table is included in a dataset for a region.


C.

Adjust the settings for each table to allow a related region-based security group view access.


D.

Adjust the settings for each view to allow a related region-based security group view access.


E.

Adjust the settings for each dataset to allow a related region-based security group view access.


Questions # 46:

Your software uses a simple JSON format for all messages. These messages are published to Google Cloud Pub/Sub, then processed with Google Cloud Dataflow to create a real-time dashboard for the CFO. During testing, you notice that some messages are missing in thedashboard. You check the logs, and all messages are being published to Cloud Pub/Sub successfully. What should you do next?

Options:

A.

Check the dashboard application to see if it is not displaying correctly.


B.

Run a fixed dataset through the Cloud Dataflow pipeline and analyze the output.


C.

Use Google Stackdriver Monitoring on Cloud Pub/Sub to find the missing messages.


D.

Switch Cloud Dataflow to pull messages from Cloud Pub/Sub instead of Cloud Pub/Sub pushing messages to Cloud Dataflow.


Questions # 47:

Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow. Numerous data logs are being are being generated during this step, and the team wants to analyze them. Due to the dynamic nature of the campaign, the data is growing exponentially every hour.

The data scientists have written the following code to read the data for a new key features in the logs.

BigQueryIO.Read

.named(“ReadLogData”)

.from(“clouddataflow-readonly:samples.log_data”)

You want to improve the performance of this data read. What should you do?

Options:

A.

Specify the TableReference object in the code.


B.

Use .fromQuery operation to read specific fields from the table.


C.

Use of both the Google BigQuery TableSchema and TableFieldSchema classes.


D.

Call a transform that returns TableRow objects, where each element in the PCollexction represents a single row in the table.


Questions # 48:

You are building new real-time data warehouse for your company and will use Google BigQuery streaming inserts. There is no guarantee that data will only be sent in once but you do have a unique ID for each row of data and an event timestamp. You want to ensure that duplicates are not included while interactively querying data. Which query type should you use?

Options:

A.

Include ORDER BY DESK on timestamp column and LIMIT to 1.


B.

Use GROUP BY on the unique ID column and timestamp column and SUM on the values.


C.

Use the LAG window function with PARTITION by unique ID along with WHERE LAG IS NOT NULL.


D.

Use the ROW_NUMBER window function with PARTITION by unique ID along with WHERE row equals 1.


Questions # 49:

You are working on a sensitive project involving private user data. You have set up a project on Google Cloud Platform to house your work internally. An external consultant is going to assist with coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should you maintain users’ privacy?

Options:

A.

Grant the consultant the Viewer role on the project.


B.

Grant the consultant the Cloud Dataflow Developer role on the project.


C.

Create a service account and allow the consultant to log on with it.


D.

Create an anonymized sample of the data for the consultant to work with in a different project.


Questions # 50:

Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can be applied to the data. Which three machine learning applications can you use? (Choose three.)

Options:

A.

Supervised learning to determine which transactions are most likely to be fraudulent.


B.

Unsupervised learning to determine which transactions are most likely to be fraudulent.


C.

Clustering to divide the transactions into N categories based on feature similarity.


D.

Supervised learning to predict the location of a transaction.


E.

Reinforcement learning to predict the location of a transaction.


F.

Unsupervised learning to predict the location of a transaction.


Viewing page 5 out of 6 pages
Viewing questions 41-50 out of questions