Option A is the most complete solution because it provides a fully automated canary strategy with staged traffic shifts, metric-based decisioning, and automatic rollback, all using managed AWS services. The requirement emphasizes automation, health-based traffic progression, and zero manual intervention to revert if performance degrades.
AWS Step Functions is well suited for orchestrating controlled deployment workflows with deterministic stages, waits, and conditional branches. By shifting traffic in stages and pausing for observation windows, the system can evaluate real-time inference latency and error rates before promoting more traffic to the new model version. Amazon CloudWatch provides the necessary real-time metrics and alarms for latency and error monitoring.
Invoking a Lambda function to evaluate CloudWatch metrics enables dynamic logic: increase traffic if thresholds remain healthy, reduce traffic or roll back if error rates rise or latency exceeds limits. Step Functions can halt the deployment by stopping progression or triggering rollback steps immediately, meeting the requirement for automated revert without human action.
Amazon EventBridge provides reliable automation triggers when a new model version is released, ensuring the deployment process is event-driven and repeatable.
Option B depends on “external logic,” which introduces operational risk and does not guarantee automatic rollback without custom systems. Option C incorrectly uses SageMaker endpoint variants to represent Bedrock model versions, which is not the intended integration model. Option D is overly indirect and operationally complex, using log pipelines and automation runbooks instead of direct metric-based traffic control.
Therefore, Option A best meets the requirements for automated gradual traffic shifting, real-time monitoring, and automatic rollback for Amazon Bedrock model deployments in a canary strategy.
Submit