A:True — the input embedding dimension is split across multiple heads, so each head operates on a lower-dimensional subspace before concatenation.
B:True — having multiple attention heads allows the model to attend to information from different representation subspaces simultaneously.
C:False — each head has its own learned linear transformations for queries, keys, and values.
D:False — after concatenation, the result is passed through a final linear projection, not fed back into the attention module directly.
Exact Extract from HCIP-AI EI Developer V2.5:
"Multi-head attention divides the embedding dimension across heads to learn from multiple subspaces in parallel, then concatenates and linearly projects the result."
[Reference:HCIP-AI EI Developer V2.5 Official Study Guide – Chapter: Transformer Multi-Head Attention, ]
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit