Google Professional-Data-Engineer Exam Questions Free Practice Test

Viewing page 7 out of 8 pages

Viewing questions 61-70 out of questions

Questions # 61:

You want to use Google Stackdriver Logging to monitor Google BigQuery usage. You need an instant notification to be sent to your monitoring tool when new data is appended to a certain table using an insert job, but you do not want to receive notifications for other tables. What should you do?

Options:

Make a call to the Stackdriver API to list all logs, and apply an advanced filter.

In the Stackdriver logging admin interface, and enable a log sink export to BigQuery.

In the Stackdriver logging admin interface, enable a log sink export to Google Cloud Pub/Sub, and subscribe to the topic from your monitoring tool.

Using the Stackdriver API, create a project sink with advanced log filter to export to Pub/Sub, and subscribe to the topic from your monitoring tool.

Expert Solution

Questions # 62:

Your company is in a highly regulated industry. One of your requirements is to ensure individual users have access only to the minimum amount of information required to do their jobs. You want to enforce this requirement with Google BigQuery. Which three approaches can you take? (Choose three.)

Options:

Disable writes to certain tables.

Restrict access to tables by role.

Ensure that the data is encrypted at all times.

Restrict BigQuery API access to approved users.

Segregate data across multiple tables or databases.

Use Google Stackdriver Audit Logging to determine policy violations.

Expert Solution

Questions # 63:

You are working on a sensitive project involving private user data. You have set up a project on Google Cloud Platform to house your work internally. An external consultant is going to assist with coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should you maintain users’ privacy?

Options:

Grant the consultant the Viewer role on the project.

Grant the consultant the Cloud Dataflow Developer role on the project.

Create a service account and allow the consultant to log on with it.

Create an anonymized sample of the data for the consultant to work with in a different project.

Expert Solution

Questions # 64:

Your company’s on-premises Apache Hadoop servers are approaching end-of-life, and IT has decided to migrate the cluster to Google Cloud Dataproc. A like-for-like migration of the cluster would require 50 TB of Google Persistent Disk per node. The CIO is concerned about the cost of using that much block storage. You want to minimize the storage cost of the migration. What should you do?

Options:

Put the data into Google Cloud Storage.

Use preemptible virtual machines (VMs) for the Cloud Dataproc cluster.

Tune the Cloud Dataproc cluster so that there is just enough disk for all data.

Migrate some of the cold data into Google Cloud Storage, and keep only the hot data in Persistent Disk.

Expert Solution

Questions # 65:

You create an important report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. You notice that visualizations are not showing data that is less than 1 hour old. What should you do?

Options:

Disable caching by editing the report settings.

Disable caching in BigQuery by editing table details.

Refresh your browser tab showing the visualizations.

Clear your browser history for the past hour then reload the tab showing the virtualizations.

Expert Solution

Questions # 66:

You are creating a model to predict housing prices. Due to budget constraints, you must run it on a single resource-constrained virtual machine. Which learning algorithm should you use?

Options:

Linear regression

Logistic classification

Recurrent neural network

Feedforward neural network

Expert Solution

Questions # 67:

Your company uses a proprietary system to send inventory data every 6 hours to a data ingestion service in the cloud. Transmitted data includes a payload of several fields and the timestamp of the transmission. If there are any concerns about a transmission, the system re-transmits the data. How should you deduplicate the data most efficiency?

Options:

Assign global unique identifiers (GUID) to each data entry.

Compute the hash value of each data entry, and compare it with all historical data.

Store each data entry as the primary key in a separate database and apply an index.

Maintain a database table to store the hash value and other metadata for each data entry.

Expert Solution

Questions # 68:

You are designing a basket abandonment system for an ecommerce company. The system will send a message to a user based on these rules:

No interaction by the user on the site for 1 hour

Has added more than $30 worth of products to the basket

Has not completed a transaction

You use Google Cloud Dataflow to process the data and decide if a message should be sent. How should you design the pipeline?

Options:

Use a fixed-time window with a duration of 60 minutes.

Use a sliding time window with a duration of 60 minutes.

Use a session window with a gap time duration of 60 minutes.

Use a global window with a time based trigger with a delay of 60 minutes.

Expert Solution

Questions # 69:

You need to analyze user clickstream data to personalize content recommendations. The data arrives continuously and needs to be processed with low latency, including transformations such as sessionization (grouping clicks by user within a time window) and aggregation of user activity. You need to identify a scalable solution to handle millions of events each second and be resilient to late-arriving data. What should you do?

Options:

Use Firebase Realtime Database for ingestion and storage, and Cloud Run functions for processing and analytics.

Use Cloud Storage for ingestion, Dataproc with Apache Spark for batch processing, and BigQuery for storage and analytics.

Use Pub/Sub for ingestion, Dataflow with Apache Beam for processing, and BigQuery for storage and analytics.

Use Cloud Data Fusion for ingestion and transformation, and Cloud SQL for storage and analytics.

Expert Solution

Answer

Explanation

Comprehensive and Detailed Explanation:

This question requires a solution that excels at large-scale, stateful stream processing with sophisticated windowing and handling of out-of-order data.

Option C is the correct answer because this architecture is perfectly suited for the requirements.

Pub/Sub is the global, scalable ingestion service for continuous event data.

Dataflow, with the Apache Beam programming model, is specifically designed for complex stream processing. It has powerful, built-in support for different windowing strategies (including session windows for sessionization) and sophisticated triggers for handling late-arriving data. Its serverless nature ensures it scales to handle millions of events.

BigQuery is the ideal sink for the processed data, enabling large-scale analytics for the recommendation engine.

Option A is incorrect as Firebase and Cloud Run are more suited for application backends and are not designed for complex, stateful data processing pipelines at this scale.

Option B is incorrect because it describes a batch processing pattern. Using Cloud Storage for ingestion and Dataproc for batch processing would introduce high latency, failing the "low latency" requirement.

Option D is incorrect because Cloud Data Fusion is primarily a batch-oriented ETL/ELT tool, and Cloud SQL is not an analytical data warehouse capable of handling this scale of data for analytics.

Reference (Google Cloud Documentation Concepts):

This is another example of the canonical Pub/Sub -> Dataflow -> BigQuery streaming analytics pattern. The Apache Beam Programming Guide (which is the foundation for Dataflow) extensively covers concepts like Windowing (specifically SessionWindows) and Triggers for handling late data. These features are critical for accurately processing real-world event streams like clickstream data and are core strengths of Dataflow.

Questions # 70:

Different teams in your organization store customer and performance data in BigOuery. Each team needs to keep full control of their collected data, be able to query data within their projects, and be able to exchange their data with other teams. You need to implement an organization-wide solution, while minimizing operational tasks and costs. What should you do?

Options:

Create a BigQuery scheduled query to replicate all customer data into team projects.

Enable each team to create materialized views of the data they need to access in their projects.

Ask each team to publish their data in Analytics Hub. Direct the other teams to subscribe to them.

Ask each team to create authorized views of their data. Grant the biquery. jobUser role to each team.

Expert Solution

Viewing page 7 out of 8 pages

Viewing questions 61-70 out of questions

Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: force70

Pass the Google Google Cloud Certified Professional-Data-Engineer Questions and answers with CertsForce