Spring Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the Google Google Cloud Certified Professional-Data-Engineer Questions and answers with CertsForce

Viewing page 2 out of 8 pages
Viewing questions 11-20 out of questions
Questions # 11:

The data analyst team at your company uses BigQuery for ad-hoc queries and scheduled SQL pipelines in a Google Cloud project with a slot reservation of 2000 slots. However, with the recent introduction of hundreds of new non time-sensitive SQL pipelines, the team is encountering frequent quota errors. You examine the logs and notice that approximately 1500 queries are being triggered concurrently during peak time. You need to resolve the concurrency issue. What should you do?

Options:

A.

Update SQL pipelines and ad-hoc queries to run as interactive query jobs.


B.

Increase the slot capacity of the project with baseline as 0 and maximum reservation size as 3000.


C.

Update SOL pipelines to run as a batch query, and run ad-hoc queries as interactive query jobs.


D.

Increase the slot capacity of the project with baseline as 2000 and maximum reservation size as 3000.


Expert Solution
Questions # 12:

You want to use a database of information about tissue samples to classify future tissue samples as either normal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissue samples. Which two characteristic support this method? (Choose two.)

Options:

A.

There are very few occurrences of mutations relative to normal samples.


B.

There are roughly equal occurrences of both normal and mutated samples in the database.


C.

You expect future mutations to have different features from the mutated samples in the database.


D.

You expect future mutations to have similar features to the mutated samples in the database.


E.

You already have labels for which samples are mutated and which are normal in the database.


Expert Solution
Questions # 13:

What is the HBase Shell for Cloud Bigtable?

Options:

A.

The HBase shell is a GUI based interface that performs administrative tasks, such as creating and deleting tables.


B.

The HBase shell is a command-line tool that performs administrative tasks, such as creating and deleting tables.


C.

The HBase shell is a hypervisor based shell that performs administrative tasks, such as creating and deleting new virtualized instances.


D.

The HBase shell is a command-line tool that performs only user account management functions to grant access to Cloud Bigtable instances.


Expert Solution
Questions # 14:

Suppose you have a table that includes a nested column called "city" inside a column called "person", but when you try to submit the following query in BigQuery, it gives you an error.

SELECT person FROM `project1.example.table1` WHERE city = "London"

How would you correct the error?

Options:

A.

Add ", UNNEST(person)" before the WHERE clause.


B.

Change "person" to "person.city".


C.

Change "person" to "city.person".


D.

Add ", UNNEST(city)" before the WHERE clause.


Expert Solution
Questions # 15:

You are collecting loT sensor data from millions of devices across the world and storing the data in BigQuery. Your access pattern is based on recent data tittered by location_id and device_version with the following query:

You want to optimize your queries for cost and performance. How should you structure your data?

Options:

A.

Partition table data by create_date, location_id and device_version


B.

Partition table data by create_date cluster table data by tocation_id and device_version


C.

Cluster table data by create_date location_id and device_version


D.

Cluster table data by create_date, partition by location and device_version


Expert Solution
Questions # 16:

All Google Cloud Bigtable client requests go through a front-end server ______ they are sent to a Cloud Bigtable node.

Options:

A.

before


B.

after


C.

only if


D.

once


Expert Solution
Questions # 17:

MJTelco’s Google Cloud Dataflow pipeline is now ready to start receiving data from the 50,000 installations. You want to allow Cloud Dataflow to scale its compute power up as required. Which Cloud Dataflow pipeline configuration setting should you update?

Options:

A.

The zone


B.

The number of workers


C.

The disk size per worker


D.

The maximum number of workers


Expert Solution
Questions # 18:

Flowlogistic’s management has determined that the current Apache Kafka servers cannot handle the data volume for their real-time inventory tracking system. You need to build a new system on Google Cloud Platform (GCP) that will feed the proprietary tracking software. The system must be able to ingest data from a variety of global sources, process and query in real-time, and store the data reliably. Which combination of GCP products should you choose?

Options:

A.

Cloud Pub/Sub, Cloud Dataflow, and Cloud Storage


B.

Cloud Pub/Sub, Cloud Dataflow, and Local SSD


C.

Cloud Pub/Sub, Cloud SQL, and Cloud Storage


D.

Cloud Load Balancing, Cloud Dataflow, and Cloud Storage


Expert Solution
Questions # 19:

Flowlogistic is rolling out their real-time inventory tracking system. The tracking devices will all send package-tracking messages, which will now go to a single Google Cloud Pub/Sub topic instead of the Apache Kafka cluster. A subscriber application will then process the messages for real-time reporting and store them in Google BigQuery for historical analysis. You want to ensure the package data can be analyzed over time.

Which approach should you take?

Options:

A.

Attach the timestamp on each message in the Cloud Pub/Sub subscriber application as they are received.


B.

Attach the timestamp and Package ID on the outbound message from each publisher device as they are sent to Clod Pub/Sub.


C.

Use the NOW () function in BigQuery to record the event’s time.


D.

Use the automatically generated timestamp from Cloud Pub/Sub to order the data.


Expert Solution
Questions # 20:

Flowlogistic’s CEO wants to gain rapid insight into their customer base so his sales team can be better informed in the field. This team is not very technical, so they’ve purchased a visualization tool to simplify the creation of BigQuery reports. However, they’ve been overwhelmed by all thedata in the table, and are spending a lot of money on queries trying to find the data they need. You want to solve their problem in the most cost-effective way. What should you do?

Options:

A.

Export the data into a Google Sheet for virtualization.


B.

Create an additional table with only the necessary columns.


C.

Create a view on the table to present to the virtualization tool.


D.

Create identity and access management (IAM) roles on the appropriate columns, so only they appear in a query.


Expert Solution
Viewing page 2 out of 8 pages
Viewing questions 11-20 out of questions