The correct answer is A. Create Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless clusters to process the data.
Amazon MSK Serverless allows organizations to run Apache Kafka workloads without managing or provisioning the underlying infrastructure. Unlike MSK Provisioned clusters (Option B), serverless clusters automatically scale to accommodate variable workloads and handle operational tasks such as patching, scaling, and monitoring. This meets the company’s requirement to avoid managing or scaling infrastructure while providing near real-time streaming analytics.
Option C, using Amazon Kinesis Data Streams with Application Auto Scaling, provides a managed streaming solution but is not based on open-source technology. While Kinesis can handle near real-time data, the question specifically emphasizes the company’s desire for open-source technology, which points toward Kafka.
Option D, self-hosting Apache Flink on EC2, requires manual management of servers, scaling, patching, and container orchestration. This approach contradicts the requirement of not managing or scaling infrastructure and introduces operational complexity.
MSK Serverless provides seamless integration with open-source Kafka APIs, enabling applications to produce and consume streaming data in real time while AWS handles scaling and availability. It is ideal for analytics pipelines where data ingestion is continuous and variable, and infrastructure overhead must be minimized.
By choosing MSK Serverless, the company can implement near real-time updates for its analytics platform using open-source Kafka without the operational burden of cluster management, fully aligning with AWS best practices for deployment and orchestration of ML workflows in a managed, scalable, and serverless manner.
Submit