Comprehensive and Detailed Explanation From Exact AWS AI documents:
A multimodal large language model (LLM) can:
Accept both text and image inputs
Understand visual and textual context
Generate coherent written explanations
AWS generative AI guidance positions multimodal LLMs as the best choice for applications requiring cross-modal understanding and text generation.
Why the other options are incorrect:
Computer vision (A) does not generate text explanations.
Diffusion models (C) generate images.
Text-to-speech (D) converts text to audio.
AWS AI document references:
Multimodal Foundation Models on AWS
Building AI Tutors with Generative Models
Text and Image Understanding with LLMs
Submit