Option B is the correct solution because it parallelizes independent foundation model inference tasks while maintaining orchestration, observability, and time-bound execution. AWS Generative AI best practices emphasize reducing end-to-end latency by parallelizing independent inference calls rather than scaling individual calls vertically.
In this scenario, each report requires multiple independent analyses such as financial extraction, sentiment analysis, and compliance validation. These tasks do not depend on each other’s output, making them ideal candidates for parallel execution. AWS Step Functions provides a Parallel state that can invoke multiple AWS Lambda functions simultaneously, drastically reducing total processing time compared to sequential execution.
By invoking Amazon Bedrock from separate Lambda functions in parallel, the system can reduce batch execution time from 45 seconds to well under the 10-second requirement, assuming each inference call remains within acceptable latency bounds. Step Functions also provide built-in error handling, retries, and state tracking, which improves reliability without increasing complexity.
CloudWatch metrics allow teams to monitor both workflow execution time and individual model inference latency, enabling performance tuning and operational visibility. Configuring client-side timeouts ensures that slow or failed model invocations do not block the entire batch.
Option A still processes tasks sequentially and therefore cannot meet the strict latency requirement. Option C introduces queuing delays and sequential processing within each report, which increases total execution time. Option D relies on container-based sequential processing and adds unnecessary operational overhead for a workload that is event-driven and latency-sensitive.
Therefore, Option B best meets the performance, scalability, and operational efficiency requirements for high-speed batch document analysis using Amazon Bedrock.
Submit