Pre-Winter Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: pass65

Pass the Google Google Cloud Certified Professional-Data-Engineer Questions and answers with CertsForce

Viewing page 5 out of 6 pages
Viewing questions 41-50 out of questions
Questions # 41:

Your company built a TensorFlow neural-network model with a large number of neurons and layers. The model fits well for the training data. However, when tested against new data, it performs poorly. What method can you employ to address this?

Options:

A.

Threading


B.

Serialization


C.

Dropout Methods


D.

Dimensionality Reduction


Expert Solution
Questions # 42:

Your company is streaming real-time sensor data from their factory floor into Bigtable and they have noticed extremely poor performance. How should the row key be redesigned to improve Bigtable performance on queries that populate real-time dashboards?

Options:

A.

Use a row key of the form .


B.

Use a row key of the form .


C.

Use a row key of the form #.


D.

Use a row key of the form >##.


Expert Solution
Questions # 43:

You have spent a few days loading data from comma-separated values (CSV) files into the Google BigQuery table CLICK_STREAM. The column DT stores the epoch time of click events. For convenience, you chose a simple schema where every field is treated as the STRING type. Now, you want to compute web session durations of users who visit your site, and you want to change its data type to the TIMESTAMP. You want to minimize the migration effort without making future queries computationally expensive. What should you do?

Options:

A.

Delete the table CLICK_STREAM, and then re-create it such that the column DT is of the TIMESTAMP type. Reload the data.


B.

Add a column TS of the TIMESTAMP type to the table CLICK_STREAM, and populate the numeric values from the column TS for each row. Reference the column TS instead of the column DT from now on.


C.

Create a view CLICK_STREAM_V, where strings from the column DT are cast into TIMESTAMP values. Reference the view CLICK_STREAM_V instead of the table CLICK_STREAM from now on.


D.

Add two columns to the table CLICK STREAM: TS of the TIMESTAMP type and IS_NEW of the BOOLEAN type. Reload all data in append mode. For each appended row, set the value of IS_NEW to true. For future queries, reference the column TS instead of the column DT, with the WHERE clause ensuring that the value of IS_NEW must be true.


E.

Construct a query to return every row of the table CLICK_STREAM, while using the built-in function to cast strings from the column DT into TIMESTAMP values. Run the query into a destination table NEW_CLICK_STREAM, in which the column TS is the TIMESTAMP type. Reference the table NEW_CLICK_STREAM instead of the table CLICK_STREAM from now on. In the future, new data is loaded into the table NEW_CLICK_STREAM.


Expert Solution
Questions # 44:

Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can be applied to the data. Which three machine learning applications can you use? (Choose three.)

Options:

A.

Supervised learning to determine which transactions are most likely to be fraudulent.


B.

Unsupervised learning to determine which transactions are most likely to be fraudulent.


C.

Clustering to divide the transactions into N categories based on feature similarity.


D.

Supervised learning to predict the location of a transaction.


E.

Reinforcement learning to predict the location of a transaction.


F.

Unsupervised learning to predict the location of a transaction.


Expert Solution
Questions # 45:

You want to use a database of information about tissue samples to classify future tissue samples as either normal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissue samples. Which two characteristic support this method? (Choose two.)

Options:

A.

There are very few occurrences of mutations relative to normal samples.


B.

There are roughly equal occurrences of both normal and mutated samples in the database.


C.

You expect future mutations to have different features from the mutated samples in the database.


D.

You expect future mutations to have similar features to the mutated samples in the database.


E.

You already have labels for which samples are mutated and which are normal in the database.


Expert Solution
Questions # 46:

Your company’s on-premises Apache Hadoop servers are approaching end-of-life, and IT has decided to migrate the cluster to Google Cloud Dataproc. A like-for-like migration of the cluster would require 50 TB of Google Persistent Disk per node. The CIO is concerned about the cost of using that much block storage. You want to minimize the storage cost of the migration. What should you do?

Options:

A.

Put the data into Google Cloud Storage.


B.

Use preemptible virtual machines (VMs) for the Cloud Dataproc cluster.


C.

Tune the Cloud Dataproc cluster so that there is just enough disk for all data.


D.

Migrate some of the cold data into Google Cloud Storage, and keep only the hot data in Persistent Disk.


Expert Solution
Questions # 47:

You are working on a sensitive project involving private user data. You have set up a project on Google Cloud Platform to house your work internally. An external consultant is going to assist with coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should you maintain users’ privacy?

Options:

A.

Grant the consultant the Viewer role on the project.


B.

Grant the consultant the Cloud Dataflow Developer role on the project.


C.

Create a service account and allow the consultant to log on with it.


D.

Create an anonymized sample of the data for the consultant to work with in a different project.


Expert Solution
Questions # 48:

You create an important report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. You notice that visualizations are not showing data that is less than 1 hour old. What should you do?

Options:

A.

Disable caching by editing the report settings.


B.

Disable caching in BigQuery by editing table details.


C.

Refresh your browser tab showing the visualizations.


D.

Clear your browser history for the past hour then reload the tab showing the virtualizations.


Expert Solution
Questions # 49:

You need to compose visualizations for operations teams with the following requirements:

Which approach meets the requirements?

Options:

A.

Load the data into Google Sheets, use formulas to calculate a metric, and use filters/sorting to show only suboptimal links in a table.


B.

Load the data into Google BigQuery tables, write Google Apps Script that queries the data, calculates the metric, and shows only suboptimal rows in a table in Google Sheets.


C.

Load the data into Google Cloud Datastore tables, write a Google App Engine Application that queries all rows, applies a function to derive the metric, and then renders results in a table using the Google charts and visualization API.


D.

Load the data into Google BigQuery tables, write a Google Data Studio 360 report that connects to your data, calculates a metric, and then uses a filter expression to show only suboptimal rows in a table.


Expert Solution
Questions # 50:

You designed a database for patient records as a pilot project to cover a few hundred patients in three clinics. Your design used a single database table to represent all patients and their visits, and you used self-joins to generate reports. The server resource utilization was at 50%. Since then, the scope of the project has expanded. The database must now store 100 times more patientrecords. You can no longer run the reports, because they either take too long or they encounter errors with insufficient compute resources. How should you adjust the database design?

Options:

A.

Add capacity (memory and disk space) to the database server by the order of 200.


B.

Shard the tables into smaller ones based on date ranges, and only generate reports with prespecified date ranges.


C.

Normalize the master patient-record table into the patient table and the visits table, and create other necessary tables to avoid self-join.


D.

Partition the table into smaller tables, with one for each clinic. Run queries against the smaller table pairs, and use unions for consolidated reports.


Expert Solution
Viewing page 5 out of 6 pages
Viewing questions 41-50 out of questions