Pass the Google Google Cloud Certified Professional-Data-Engineer Questions and answers with CertsForce

Viewing page 1 out of 6 pages
Viewing questions 1-10 out of questions
Questions # 1:

You want to create a machine learning model using BigQuery ML and create an endpoint foe hosting the model using Vertex Al. This will enable the processing of continuous streaming data in near-real time from multiple vendors. The data may contain invalid values. What should you do?

Options:

A.

Create a new BigOuery dataset and use streaming inserts to land the data from multiple vendors. Configure your BigQuery ML model to use the "ingestion' dataset as the training data.


B.

Use BigQuery streaming inserts to land the data from multiple vendors whore your BigQuery dataset ML model is deployed.


C.

Create a Pub'Sub topic and send all vendor data to it Connect a Cloud Function to the topic to process the data and store it in BigQuery.


D.

Create a Pub/Sub topic and send all vendor data to it Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.


Questions # 2:

As your organization expands its usage of GCP, many teams have started to create their own projects. Projects are further multiplied to accommodate different stages of deployments and target audiences. Each project requires unique access control configurations. The central IT team needs to have access to all projects. Furthermore, data from Cloud Storage buckets and BigQuery datasets must be shared for use in other projects in an ad hoc way. You want to simplify access control management by minimizing the number of policies. Which two steps should you take? Choose 2 answers.

Options:

A.

Use Cloud Deployment Manager to automate access provision.


B.

Introduce resource hierarchy to leverage access control policy inheritance.


C.

Create distinct groups for various teams, and specify groups in Cloud IAM policies.


D.

Only use service accounts when sharing data for Cloud Storage buckets and BigQuery datasets.


E.

For each Cloud Storage bucket or BigQuery dataset, decide which projects need access. Find all the active members who have access to these projects, and create a Cloud IAM policy to grant access to all these users.


Questions # 3:

You have data located in BigQuery that is used to generate reports for your company. You have noticed some weekly executive report fields do not correspond to format according to company standards for example, report errors include different telephone formats and different country code identifiers. This is a frequent issue, so you need to create a recurring job to normalize the data. You want a quick solution that requires no coding What should you do?

Options:

A.

Use Cloud Data Fusion and Wrangler to normalize the data, and set up a recurring job.


B.

Use BigQuery and GoogleSQL to normalize the data, and schedule recurring quenes in BigQuery.


C.

Create a Spark job and submit it to Dataproc Serverless.


D.

Use Dataflow SQL to create a job that normalizes the data, and that after the first run of the job, schedule the pipeline to execute recurrently.


Questions # 4:

You maintain ETL pipelines. You notice that a streaming pipeline running on Dataflow is taking a long time to process incoming data, which causes output delays. You also noticed that the pipeline graph was automatically optimized by Dataflow and merged into one step. You want to identify where the potential bottleneck is occurring. What should you do?

Options:

A.

Insert a Reshuffle operation after each processing step, and monitor the execution details in the Dataflow console.


B.

Log debug information in each ParDo function, and analyze the logs at execution time.


C.

Insert output sinks after each key processing step, and observe the writing throughput of each block.


D.

Verify that the Dataflow service accounts have appropriate permissions to write the processed data to the output sinks


Questions # 5:

One of your encryption keys stored in Cloud Key Management Service (Cloud KMS) was exposed. You need to re-encrypt all of your CMEK-protected Cloud Storage data that used that key. and then delete the compromised key. You also want to reduce the risk of objects getting written without customer-managed encryption key (CMEK protection in the future. What should you do?

Options:

A.

Rotate the Cloud KMS key version. Continue to use the same Cloud Storage bucket.


B.

Create a new Cloud KMS key. Set the default CMEK key on the existing Cloud Storage bucket to the new one.


C.

Create a new Cloud KMS key. Create a new Cloud Storage bucket. Copy all objects from the old bucket to the new one bucket while specifying the new Cloud KMS key in the copy command.


D.

Create a new Cloud KMS key. Create a new Cloud Storage bucket configured to use the new key as the default CMEK key. Copy all objects from the old bucket to the new bucket without specifying a key.


Questions # 6:

Your infrastructure includes a set of YouTube channels. You have been tasked with creating a process for sending the YouTube channel data to Google Cloud for analysis. You want to design a solution that allows your world-wide marketing teams to perform ANSI SQL and other types of analysis on up-to-date YouTube channels log data. How should you set up the log data transfer into Google Cloud?

Options:

A.

Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.


B.

Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Regional bucket as a final destination.


C.

Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.


D.

Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Regional

storage bucket as a final destination.


Questions # 7:

Government regulations in the banking industry mandate the protection of client’s personally identifiable information (PII). Your company requires PII to be access controlled encrypted and compliant with major data protection standards In addition to using Cloud Data Loss Prevention (Cloud DIP) you want to follow Google-recommended practices and use service accounts to control access to PII. What should you do?

Options:

A.

Assign the required identity and Access Management (IAM) roles to every employee, and create a single service account to access protect resources


B.

Use one service account to access a Cloud SQL database and use separate service accounts for each human user


C.

Use Cloud Storage to comply with major data protection standards. Use one service account shared by all users


D.

Use Cloud Storage to comply with major data protection standards. Use multiple service accounts attached to IAM groups to grant the appropriate access to each group


Questions # 8:

You are building an application to share financial market data with consumers, who will receive data feeds. Data is collected from the markets in real time. Consumers will receive the data in the following ways:

    Real-time event stream

    ANSI SQL access to real-time stream and historical data

    Batch historical exports

Which solution should you use?

Options:

A.

Cloud Dataflow, Cloud SQL, Cloud Spanner


B.

Cloud Pub/Sub, Cloud Storage, BigQuery


C.

Cloud Dataproc, Cloud Dataflow, BigQuery


D.

Cloud Pub/Sub, Cloud Dataproc, Cloud SQL


Questions # 9:

You have a variety of files in Cloud Storage that your data science team wants to use in their models Currently, users do not have a method to explore, cleanse, and validate the data in Cloud Storage. You are looking for a low code solution that can be used by your data science team to quickly cleanse and explore data within Cloud Storage. What should you do?

Options:

A.

Load the data into BigQuery and use SQL to transform the data as necessary Provide the data science team access to staging tables to explore the raw data.


B.

Provide the data science team access to Dataflow to create a pipeline to prepare and validate the raw data and load data into BigQuery for data exploration.


C.

Provide the data science team access to Dataprep to prepare, validate, and explore the data within Cloud Storage.


D.

Create an external table in BigQuery and use SQL to transform the data as necessary Provide the data science team access to the external tables to explore the raw data.


Questions # 10:

You are operating a Cloud Dataflow streaming pipeline. The pipeline aggregates events from a Cloud Pub/Sub subscription source, within a window, and sinks the resulting aggregation to a Cloud Storage bucket. The source has consistent throughput. You want to monitor an alert on behavior of the pipeline with Cloud Stackdriver to ensure that it is processing data. Which Stackdriver alerts should you create?

Options:

A.

An alert based on a decrease of subscription/num_undelivered_messages for the source and a rate of change increase of instance/storage/used_bytes for the destination


B.

An alert based on an increase of subscription/num_undelivered_messages for the source and a rate of change decrease of instance/storage/used_bytes for the destination


C.

An alert based on a decrease of instance/storage/used_bytes for the source and a rate of change increase of subscription/num_undelivered_messages for the destination


D.

An alert based on an increase of instance/storage/used_bytes for the source and a rate of change decrease of subscription/num_undelivered_messages for the destination


Viewing page 1 out of 6 pages
Viewing questions 1-10 out of questions