Pass the Google Google Cloud Certified Professional-Data-Engineer Questions and answers with CertsForce

Viewing page 2 out of 6 pages
Viewing questions 11-20 out of questions
Questions # 11:

You work for a mid-sized enterprise that needs to move its operational system transaction data from an on-premises database to GCP. The database is about 20 TB in size. Which database should you choose?

Options:

A.

Cloud SQL


B.

Cloud Bigtable


C.

Cloud Spanner


D.

Cloud Datastore


Questions # 12:

The marketing team at your organization provides regular updates of a segment of your customer dataset. The marketing team has given you a CSV with 1 million records that must be updated in BigQuery. When you use the UPDATE statement in BigQuery, you receive a quotaExceeded error. What should you do?

Options:

A.

Reduce the number of records updated each day to stay within the BigQuery UPDATE DML statement limit.


B.

Increase the BigQuery UPDATE DML statement limit in the Quota management section of the Google Cloud Platform Console.


C.

Split the source CSV file into smaller CSV files in Cloud Storage to reduce the number of BigQuery UPDATE DML statements per BigQuery job.


D.

Import the new records from the CSV file into a new BigQuery table. Create a BigQuery job that merges the new records with the existing records and writes the results to a new BigQuery table.


Questions # 13:

You want to migrate an on-premises Hadoop system to Cloud Dataproc. Hive is the primary tool in use, and the data format is Optimized Row Columnar (ORC). All ORC files have been successfully copied to a Cloud Storage bucket. You need to replicate some data to the cluster’s local Hadoop Distributed File System (HDFS) to maximize performance. What are two ways to start using Hive in Cloud Dataproc? (Choose two.)

Options:

A.

Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to HDFS. Mount the Hive tables locally.


B.

Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to any node of the Dataproc cluster. Mount the Hive tables locally.


C.

Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to the master node of the Dataproc cluster. Then run the Hadoop utility to copy them do HDFS. Mount the Hive tables from HDFS.


D.

Leverage Cloud Storage connector for Hadoop to mount the ORC files as external Hive tables. Replicate external Hive tables to the native ones.


E.

Load the ORC files into BigQuery. Leverage BigQuery connector for Hadoop to mount the BigQuery tables as external Hive tables. Replicate external Hive tables to the native ones.


Questions # 14:

You have a requirement to insert minute-resolution data from 50,000 sensors into a BigQuery table. You expect significant growth in data volume and need the data to be available within 1 minute of ingestion for real-time analysis of aggregated trends. What should you do?

Options:

A.

Use bq load to load a batch of sensor data every 60 seconds.


B.

Use a Cloud Dataflow pipeline to stream data into the BigQuery table.


C.

Use the INSERT statement to insert a batch of data every 60 seconds.


D.

Use the MERGE statement to apply updates in batch every 60 seconds.


Questions # 15:

You migrated your on-premises Apache Hadoop Distributed File System (HDFS) data lake to Cloud Storage. The data scientist team needs to process the data by using Apache Spark and SQL. Security policies need to be enforced at the column level. You need a cost-effective solution that can scale into a data mesh. What should you do?

Options:

A.

1. Load the data to BigQuery tables.

2. Create a taxonomy of policy tags in Data Catalog.

3. Add policy tags to columns.

4. Process with the Spark-BigQuery connector or BigQuery SQL.


B.

1. Deploy a long-living Dataproc cluster with Apache Hive and Ranger enabled.

2. Configure Ranger for column level security.

3. Process with Dataproc Spark or Hive SQL.


C.

1. Apply an Identity and Access Management (IAM) policy at the file level in Cloud Storage.

2. Define a BigQuery external table for SQL processing.

3. Use Dataproc Spark to process the Cloud Storage files.


D.

1. Define a BigLake table.

2. Create a taxonomy of policy tags in Data Catalog.

3. Add policy tags to columns.

4. Process with the Spark-BigQuery connector or BigQuery SQL.


Questions # 16:

You need to move 2 PB of historical data from an on-premises storage appliance to Cloud Storage within six months, and your outbound network capacity is constrained to 20 Mb/sec. How should you migrate this data to Cloud Storage?

Options:

A.

Use Transfer Appliance to copy the data to Cloud Storage


B.

Use gsutil cp –J to compress the content being uploaded to Cloud Storage


C.

Create a private URL for the historical data, and then use Storage Transfer Service to copy the data to Cloud Storage


D.

Use trickle or ionice along with gsutil cp to limit the amount of bandwidth gsutil utilizes to less than 20 Mb/sec so it does not interfere with the production traffic


Questions # 17:

An aerospace company uses a proprietary data format to store its night data. You need to connect this new data source to BigQuery and stream the data into BigQuery. You want to efficiency import the data into BigQuery where consuming as few resources as possible. What should you do?

Options:

A.

Use a standard Dataflow pipeline to store the raw data in BigQuery and then transform the format later when the data is used.


B.

Write a shell script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source


C.

Use Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format


D.

Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format


Questions # 18:

Data Analysts in your company have the Cloud IAM Owner role assigned to them in their projects to allow them to work with multiple GCP products in their projects. Your organization requires that all BigQuery data access logs be retained for 6 months. You need to ensure that only audit personnel in your company can access the data access logs for all projects. What should you do?

Options:

A.

Enable data access logs in each Data Analyst’s project. Restrict access to Stackdriver Logging via Cloud IAM roles.


B.

Export the data access logs via a project-level export sink to a Cloud Storage bucket in the Data Analysts’ projects. Restrict access to the Cloud Storage bucket.


C.

Export the data access logs via a project-level export sink to a Cloud Storage bucket in a newly created projects for audit logs. Restrict access to the project with the exported logs.


D.

Export the data access logs via an aggregated export sink to a Cloud Storage bucket in a newly created project for audit logs. Restrict access to the project that contains the exported logs.


Questions # 19:

You have an Oracle database deployed in a VM as part of a Virtual Private Cloud (VPC) network. You want to replicate and continuously synchronize 50 tables to BigQuery. You want to minimize the need to manage infrastructure. What should you do?

Options:

A.

Create a Datastream service from Oracle to BigQuery, use a private connectivity configuration to the same VPC network, and a connection profile to BigQuery.


B.

Create a Pub/Sub subscription to write to BigQuery directly Deploy the Debezium Oracle connector to capture changes in the Oracle database, and sink to the Pub/Sub topic.


C.

Deploy Apache Kafka in the same VPC network, use Kafka Connect Oracle Change Data Capture (CDC), and Dataflow to stream the Kafka topic to BigQuery.


D.

Deploy Apache Kafka in the same VPC network, use Kafka Connect Oracle change data capture (CDC), and the Kafka Connect Google BigQuery Sink Connector.


Questions # 20:

You are choosing a NoSQL database to handle telemetry data submitted from millions of Internet-of-Things (IoT) devices. The volume of data is growing at 100 TB per year, and each data entry has about 100 attributes. The data processing pipeline does not require atomicity, consistency, isolation, and durability (ACID). However, high availability and low latency are required.

You need to analyze the data by querying against individual fields. Which three databases meet your requirements? (Choose three.)

Options:

A.

Redis


B.

HBase


C.

MySQL


D.

MongoDB


E.

Cassandra


F.

HDFS with Hive


Viewing page 2 out of 6 pages
Viewing questions 11-20 out of questions