Option A is the best solution because it satisfies streaming, token control, and retry requirements while keeping operational overhead low by using fully managed, serverless AWS services. Amazon API Gateway HTTP APIs provide a lightweight, cost-effective front door for APIs and integrate cleanly with AWS Lambda for request processing and security controls.
AWS Lambda response streaming allows the API to begin returning content to the client as soon as partial model output is available, reducing perceived latency and improving user experience for long responses. Using Lambda as the integration layer also provides a centralized place to enforce token-aware request handling, such as rejecting oversized requests, truncating optional context, or applying consistent limits across users and tenants to manage compute usage.
Retry logic is best handled in the client or integration layer for transient failures such as timeouts and throttling. Lambda can implement controlled retries with exponential backoff and jitter, while API Gateway timeouts help bound request lifetimes and prevent hung connections from consuming resources indefinitely. Because the model service is managed, the company avoids infrastructure management and focuses only on request shaping, safety, and resiliency behavior.
Option B is not suitable because client-side polling is not true streaming, front-end token enforcement is insecure and inconsistent, and API Gateway does not provide model-aware retry behavior on its own. Option C introduces container hosting and scaling complexity, which increases operational overhead compared to serverless. Option D can work, but REST APIs are generally heavier than HTTP APIs for this pattern and do not reduce overhead compared to Option A.
Therefore, Option A provides the required streaming and resiliency capabilities with the least infrastructure management effort.
Submit