Amazon Web Services MLA-C01 Exam Questions Free Practice Test

Viewing page 3 out of 8 pages

Viewing questions 21-30 out of questions

Questions # 21:

A company is building an Amazon SageMaker AI pipeline for an ML model. The pipeline uses distributed processing and distributed training.

An ML engineer needs to encrypt network communication between instances that run distributed jobs. The ML engineer configures the distributed jobs to run in a private VPC.

What should the ML engineer do to meet the encryption requirement?

Options:

Enable network isolation.

Configure traffic encryption by using security groups.

Enable inter-container traffic encryption.

Enable VPC flow logs.

Expert Solution

Answer

Explanation

In Amazon SageMaker, distributed training and distributed processing jobs often involve multiple instances exchanging data over the network. By default, when these jobs run inside a VPC, network traffic remains private but is not automatically encrypted between instances. When compliance or security requirements mandate encryption of in-transit data, additional configuration is required.

The correct solution is to enable inter-container traffic encryption, which ensures that all network communication between containers running on different instances is encrypted using TLS. Amazon SageMaker provides a built-in feature for this purpose. When inter-container traffic encryption is enabled, SageMaker automatically configures secure communication channels between all nodes participating in a distributed job, including training clusters and processing jobs.

Option A (Network isolation) is incorrect because network isolation prevents containers from making outbound network calls and accessing the internet. It does not encrypt traffic between instances.

Option B (Security groups) is incorrect because security groups control network access and traffic flow, not encryption. They can restrict which instances can communicate, but they do not provide data-in-transit encryption.

Option D (VPC flow logs) is incorrect because VPC flow logs are used for monitoring and auditing network traffic, not for encrypting it.

AWS documentation explicitly states that enabling inter-container traffic encryption is the recommended and supported approach for encrypting data exchanged between instances during distributed SageMaker jobs. This feature aligns with enterprise security best practices and regulatory requirements for protecting sensitive ML training data in transit.

Therefore, Option C is the only solution that directly fulfills the encryption requirement for distributed SageMaker workloads.

Questions # 22:

A company is gathering audio, video, and text data in various languages. The company needs to use a large language model (LLM) to summarize the gathered data that is in Spanish.

Which solution will meet these requirements in the LEAST amount of time?

Options:

Train and deploy a model in Amazon SageMaker to convert the data into English text. Train and deploy an LLM in SageMaker to summarize the text.

Use Amazon Transcribe and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Jurassic model to summarize the text.

Use Amazon Rekognition and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Anthropic Claude model to summarize the text.

Use Amazon Comprehend and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Stable Diffusion model to summarize the text.

Expert Solution

Questions # 23:

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model ' s algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

Which AWS service or feature can aggregate the data from the various data sources?

Options:

Amazon EMR Spark jobs

Amazon Kinesis Data Streams

Amazon DynamoDB

AWS Lake Formation

Expert Solution

Answer

Explanation

Problem Description:

The dataset includes multiple data sources:

Transaction logs and customer profiles in Amazon S3.

Tables in an on-premises MySQL database.

There is a class imbalance in the dataset and interdependencies among features that need to be addressed.

The solution requires data aggregation from diverse sources for centralized processing.

Why AWS Lake Formation?

AWS Lake Formation is designed to simplify the process of aggregating, cataloging, and securing data from various sources, including S3, relational databases, and other on-premises systems.

It integrates with AWS Glue for data ingestion and ETL (Extract, Transform, Load) workflows, making it a robust choice for aggregating data from Amazon S3 and on-premises MySQL databases.

How It Solves the Problem:

Data Aggregation: Lake Formation collects data from diverse sources, such as S3 and MySQL, and consolidates it into a centralized data lake.

Cataloging and Discovery: Automatically crawls and catalogs the data into a searchable catalog, which the ML engineer can query for analysis or modeling.

Data Transformation: Prepares data using Glue jobs to handle preprocessing tasks such as addressing class imbalance (e.g., oversampling, undersampling) and handling interdependencies among features.

Security and Governance: Offers fine-grained access control, ensuring secure and compliant data management.

Steps to Implement Using AWS Lake Formation:

Step 1: Set up Lake Formation and register data sources, including the S3 bucket and on-premises MySQL database.

Step 2: Use AWS Glue to create ETL jobs to transform and prepare data for the ML pipeline.

Step 3: Query and access the consolidated data lake using services such as Athena or SageMaker for further ML processing.

Why Not Other Options?

Amazon EMR Spark jobs: While EMR can process large-scale data, it is better suited for complex big data analytics tasks and does not inherently support data aggregation across sources like Lake Formation.

Amazon Kinesis Data Streams: Kinesis is designed for real-time streaming data, not batch data aggregation across diverse sources.

Amazon DynamoDB: DynamoDB is a NoSQL database and is not suitable for aggregating data from multiple sources like S3 and MySQL.

Conclusion: AWS Lake Formation is the most suitable service for aggregating data from S3 and on-premises MySQL databases, preparing the data for downstream ML tasks, and addressing challenges like class imbalance and feature interdependencies.

AWS Lake Formation Documentation

AWS Glue for Data Preparation

Questions # 24:

An airline company deploys ML models to one dozen Amazon SageMaker Al inference endpoints. The inference endpoints must be able to handle different types of

workloads in a cost-effective way.

Select the correct inference option from the following list to handle each type of workload. Select each inference option one time. (Select FOUR.)

Asynchronous inference

Batch inference

Real-time inference

Serverless inference

Question # 24

Expert Solution

Answer

Answer:

Explanation

Provide flight departure, arrival, and delay information, and provide updates for low-latency workloads→ Real-time inference

Advertise holiday travel promotional deals to millions of users in multiple markets before holiday seasons for spiky workloads→ Serverless inference

Generate quarterly and annual flight reports and insights for trend analysis of large datasets→ Batch inference

Generate online image and audio stories for passengers to watch or listen to while waiting at an airport→ Asynchronous inference

The correct mapping depends on latency requirement, traffic pattern, payload size, processing duration, and whether the workload needs a persistent endpoint.

Real-time inference is the right choice for flight departure, arrival, and delay updates because this is an online user-facing workload that requires low latency. AWS states that SageMaker real-time inference is ideal for online inference workloads with low-latency or high-throughput requirements and uses a persistent fully managed endpoint. That fits flight status information because passengers and airline systems expect immediate responses.

Serverless inference is the best choice for holiday promotional deals because this traffic is spiky, seasonal, and unpredictable. AWS describes SageMaker Serverless Inference as suitable for intermittent or unpredictable traffic patterns. It is cost-effective because SageMaker manages the infrastructure and scales down when there are no requests, so the company does not pay for idle endpoint capacity.

Batch inference is correct for quarterly and annual flight reports because this workload analyzes large datasets offline and does not need an always-running endpoint. AWS says SageMaker batch transform is used to get inferences from large datasets and when a persistent endpoint is not required. Reports and trend analysis are scheduled, non-real-time analytics workloads, so batch inference is the most cost-effective option.

Asynchronous inference is the right choice for generating online image and audio stories. These requests can have larger payloads and longer processing times than normal low-latency API calls. AWS states that SageMaker Asynchronous Inference queues incoming requests and is ideal for large payloads, long processing times, and near-real-time latency requirements. Image and audio generation can take seconds or minutes, so asynchronous inference is more appropriate than real-time inference.

Questions # 25:

An ML engineer is building an ML model in Amazon SageMaker AI. The ML engineer needs to load historical data directly from Amazon S3, Amazon Athena, and Snowflake into SageMaker AI.

Which solution will meet this requirement?

Options:

Use AWS Glue DataBrew to import the data into SageMaker AI.

Build a pipeline in SageMaker Pipelines to process the data. Use AWS DataSync to load the processed data into SageMaker AI.

Create a feature store in SageMaker Feature Store. Use an Apache Spark connector to Feature Store to access the data.

Use SageMaker Data Wrangler to query and import the data.

Expert Solution

Questions # 26:

An ML engineer at a credit card company built and deployed an ML model by using Amazon SageMaker AI. The model was trained on transaction data that contained very few fraudulent transactions. After deployment, the model is underperforming.

What should the ML engineer do to improve the model’s performance?

Options:

Retrain the model with a different SageMaker built-in algorithm.

Use random undersampling to reduce the majority class and retrain the model.

Use Synthetic Minority Oversampling Technique (SMOTE) to generate synthetic minority samples and retrain the model.

Use random oversampling to duplicate minority samples and retrain the model.

Expert Solution

Questions # 27:

An ML engineer needs to use Amazon SageMaker to fine-tune a large language model (LLM) for text summarization. The ML engineer must follow a low-code no-code (LCNC) approach.

Which solution will meet these requirements?

Options:

Use SageMaker Studio to fine-tune an LLM that is deployed on Amazon EC2 instances.

Use SageMaker Autopilot to fine-tune an LLM that is deployed by a custom API endpoint.

Use SageMaker Autopilot to fine-tune an LLM that is deployed on Amazon EC2 instances.

Use SageMaker Autopilot to fine-tune an LLM that is deployed by SageMaker JumpStart.

Expert Solution

Questions # 28:

An ML engineer is using a training job to fine-tune a deep learning model in Amazon SageMaker Studio. The ML engineer previously used the same pre-trained model with a similar

dataset. The ML engineer expects vanishing gradient, underutilized GPU, and overfitting problems.

The ML engineer needs to implement a solution to detect these issues and to react in predefined ways when the issues occur. The solution also must provide comprehensive real-time metrics during the training.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Use TensorBoard to monitor the training job. Publish the findings to an Amazon Simple Notification Service (Amazon SNS) topic. Create an AWS Lambda function to consume the findings and to initiate the predefined actions.

Use Amazon CloudWatch default metrics to gain insights about the training job. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.

Expand the metrics in Amazon CloudWatch to include the gradients in each training step. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.

Use SageMaker Debugger built-in rules to monitor the training job. Configure the rules to initiate the predefined actions.

Expert Solution

Questions # 29:

A company has AWS Glue data processing jobs that are orchestrated by an AWS Glue workflow. The AWS Glue jobs can run on a schedule or can be launched manually.

The company is developing pipelines in Amazon SageMaker Pipelines for ML model development. The pipelines will use the output of the AWS Glue jobs during the data processing phase of model development. An ML engineer needs to implement a solution that integrates the AWS Glue jobs with the pipelines.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Use AWS Step Functions for orchestration of the pipelines and the AWS Glue jobs.

Use processing steps in SageMaker Pipelines. Configure inputs that point to the Amazon Resource Names (ARNs) of the AWS Glue jobs.

Use Callback steps in SageMaker Pipelines to start the AWS Glue workflow and to stop the pipelines until the AWS Glue jobs finish running.

Use Amazon EventBridge to invoke the pipelines and the AWS Glue jobs in the desired order.

Expert Solution

Questions # 30:

A company is training a deep learning model to detect abnormalities in images. The company has limited GPU resources and a large hyperparameter space to explore. The company needs to test different configurations and avoid wasting computation time on poorly performing models that show weak validation accuracy in early epochs.

Which hyperparameter optimization strategy should the company use?

Options:

Grid search across all possible combinations

Bayesian optimization with early stopping

Manual tuning of each parameter individually

Exhaustive search without early stopping

Expert Solution

Viewing page 3 out of 8 pages

Viewing questions 21-30 out of questions

Summer Certification Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: force70

Pass the Amazon Web Services AWS Certified Associate MLA-C01 Questions and answers with CertsForce