Pass the Google Google Cloud Certified Professional-Data-Engineer Questions and answers with CertsForce

Viewing page 6 out of 6 pages
Viewing questions 51-60 out of questions
Questions # 51:

You create an important report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. You notice that visualizations are not showing data that is less than 1 hour old. What should you do?

Options:

A.

Disable caching by editing the report settings.


B.

Disable caching in BigQuery by editing table details.


C.

Refresh your browser tab showing the visualizations.


D.

Clear your browser history for the past hour then reload the tab showing the virtualizations.


Questions # 52:

Your company has hired a new data scientist who wants to perform complicated analyses across very large datasets stored in Google Cloud Storage and in a Cassandra cluster on Google Compute Engine. The scientist primarily wants to create labelled data sets for machine learning projects, along with some visualization tasks. She reports that her laptop is not powerful enough to perform her tasks and it is slowing her down. You want to help her perform her tasks. What should you do?

Options:

A.

Run a local version of Jupiter on the laptop.


B.

Grant the user access to Google Cloud Shell.


C.

Host a visualization tool on a VM on Google Compute Engine.


D.

Deploy Google Cloud Datalab to a virtual machine (VM) on Google Compute Engine.


Questions # 53:

An external customer provides you with a daily dump of data from their database. The data flows into Google Cloud Storage GCS as comma-separated values (CSV) files. You want to analyze this data in Google BigQuery, but the data could have rows that are formatted incorrectly or corrupted. How should you build this pipeline?

Options:

A.

Use federated data sources, and check data in the SQL query.


B.

Enable BigQuery monitoring in Google Stackdriver and create an alert.


C.

Import the data into BigQuery using the gcloud CLI and set max_bad_records to 0.


D.

Run a Google Cloud Dataflow batch pipeline to import the data into BigQuery, and push errors to another dead-letter table for analysis.


Questions # 54:

You want to use a database of information about tissue samples to classify future tissue samples as either normal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissue samples. Which two characteristic support this method? (Choose two.)

Options:

A.

There are very few occurrences of mutations relative to normal samples.


B.

There are roughly equal occurrences of both normal and mutated samples in the database.


C.

You expect future mutations to have different features from the mutated samples in the database.


D.

You expect future mutations to have similar features to the mutated samples in the database.


E.

You already have labels for which samples are mutated and which are normal in the database.


Questions # 55:

You are building a model to make clothing recommendations. You know a user’s fashion preference is likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available. How should you use this data to train the model?

Options:

A.

Continuously retrain the model on just the new data.


B.

Continuously retrain the model on a combination of existing data and the new data.


C.

Train on the existing data while using the new data as your test set.


D.

Train on the new data while using the existing data as your test set.


Questions # 56:

Your company uses a proprietary system to send inventory data every 6 hours to a data ingestion service in the cloud. Transmitted data includes a payload of several fields and the timestamp of the transmission. If there are any concerns about a transmission, the system re-transmits the data. How should you deduplicate the data most efficiency?

Options:

A.

Assign global unique identifiers (GUID) to each data entry.


B.

Compute the hash value of each data entry, and compare it with all historical data.


C.

Store each data entry as the primary key in a separate database and apply an index.


D.

Maintain a database table to store the hash value and other metadata for each data entry.


Questions # 57:

You want to use Google Stackdriver Logging to monitor Google BigQuery usage. You need an instant notification to be sent to your monitoring tool when new data is appended to a certain table using an insert job, but you do not want to receive notifications for other tables. What should you do?

Options:

A.

Make a call to the Stackdriver API to list all logs, and apply an advanced filter.


B.

In the Stackdriver logging admin interface, and enable a log sink export to BigQuery.


C.

In the Stackdriver logging admin interface, enable a log sink export to Google Cloud Pub/Sub, and subscribe to the topic from your monitoring tool.


D.

Using the Stackdriver API, create a project sink with advanced log filter to export to Pub/Sub, and subscribe to the topic from your monitoring tool.


Viewing page 6 out of 6 pages
Viewing questions 51-60 out of questions