Question 1

Which option best describes transformer architecture?

Accepted Answer

Attention-based encoder/decoder model.. Here, Attention-based encoder/decoder model. is the right choice. Backbone of modern LLMs. It aligns directly with what the question asks about which option best describes transformer architecture. A quick elimination of partially true options helps confirm it.

Question 2

What is the primary purpose of transformer architecture?

Accepted Answer

Attention-based encoder/decoder model.. In this case, Attention-based encoder/decoder model. is correct. Backbone of modern LLMs. It aligns directly with what the question asks about what is the primary purpose of transformer architecture. A quick elimination of partially true options helps confirm it.

Question 3

Which statement about transformer architecture is most accurate?

Accepted Answer

Attention-based encoder/decoder model.. The best option here is Attention-based encoder/decoder model.. Backbone of modern LLMs. It aligns directly with what the question asks about which statement about transformer architecture is most accurate. A quick elimination of partially true options helps confirm it.

Question 4

How is transformer architecture best characterized?

Accepted Answer

Attention-based encoder/decoder model.. For this question, Attention-based encoder/decoder model. is correct. Backbone of modern LLMs. It aligns directly with what the question asks about how is transformer architecture best characterized. A quick elimination of partially true options helps confirm it.

Question 5

Which option best describes self-attention?

Accepted Answer

Each token attends to all tokens in input.. Each token attends to all tokens in input. is the correct answer here. Captures long-range deps. It aligns directly with what the question asks about which option best describes self-attention. A quick elimination of partially true options helps confirm it.

Question 6

What is the primary purpose of self-attention?

Accepted Answer

Each token attends to all tokens in input.. Here, Each token attends to all tokens in input. is the right choice. Captures long-range deps. This matches the core idea being tested around what is the primary purpose of self-attention. A quick elimination of partially true options helps confirm it.

Question 7

Which statement about self-attention is most accurate?

Accepted Answer

Each token attends to all tokens in input.. In this case, Each token attends to all tokens in input. is correct. Captures long-range deps. This matches the core idea being tested around which statement about self-attention is most accurate. A quick elimination of partially true options helps confirm it.

Question 8

How is self-attention best characterized?

Accepted Answer

Each token attends to all tokens in input.. The best option here is Each token attends to all tokens in input.. Captures long-range deps. This matches the core idea being tested around how is self-attention best characterized. A quick elimination of partially true options helps confirm it.

Question 9

Which option best describes multi-head attention?

Accepted Answer

Multiple attention heads in parallel.. For this question, Multiple attention heads in parallel. is correct. Different subspaces. This matches the core idea being tested around which option best describes multi-head attention. A quick elimination of partially true options helps confirm it.

Question 10

What is the primary purpose of multi-head attention?

Accepted Answer

Multiple attention heads in parallel.. Multiple attention heads in parallel. is the correct answer here. Different subspaces. This matches the core idea being tested around what is the primary purpose of multi-head attention. A quick elimination of partially true options helps confirm it.

Question 11

Which statement about multi-head attention is most accurate?

Accepted Answer

Multiple attention heads in parallel.. Here, Multiple attention heads in parallel. is the right choice. Different subspaces. That is exactly the concept behind which statement about multi-head attention is most accurate in this context. A quick elimination of partially true options helps confirm it.

Question 12

How is multi-head attention best characterized?

Accepted Answer

Multiple attention heads in parallel.. In this case, Multiple attention heads in parallel. is correct. Different subspaces. That is exactly the concept behind how is multi-head attention best characterized in this context. A quick elimination of partially true options helps confirm it.

Question 13

Which option best describes positional encoding?

Accepted Answer

Inject position info into tokens.. The best option here is Inject position info into tokens.. Sin/cos or learned. That is exactly the concept behind which option best describes positional encoding in this context. A quick elimination of partially true options helps confirm it.

Question 14

What is the primary purpose of positional encoding?

Accepted Answer

Inject position info into tokens.. For this question, Inject position info into tokens. is correct. Sin/cos or learned. That is exactly the concept behind what is the primary purpose of positional encoding in this context. A quick elimination of partially true options helps confirm it.

Question 15

Which statement about positional encoding is most accurate?

Accepted Answer

Inject position info into tokens.. Inject position info into tokens. is the correct answer here. Sin/cos or learned. That is exactly the concept behind which statement about positional encoding is most accurate in this context. A quick elimination of partially true options helps confirm it.

Question 16

How is positional encoding best characterized?

Accepted Answer

Inject position info into tokens.. Here, Inject position info into tokens. is the right choice. Sin/cos or learned. It fits the requirement in the prompt about how is positional encoding best characterized. A quick elimination of partially true options helps confirm it.

Question 17

Which option best describes layer normalization?

Accepted Answer

Normalize across features per token.. In this case, Normalize across features per token. is correct. Stabilizes training. It fits the requirement in the prompt about which option best describes layer normalization. A quick elimination of partially true options helps confirm it.

Question 18

What is the primary purpose of layer normalization?

Accepted Answer

Normalize across features per token.. The best option here is Normalize across features per token.. Stabilizes training. It fits the requirement in the prompt about what is the primary purpose of layer normalization. A quick elimination of partially true options helps confirm it.

Question 19

Which statement about layer normalization is most accurate?

Accepted Answer

Normalize across features per token.. For this question, Normalize across features per token. is correct. Stabilizes training. It fits the requirement in the prompt about which statement about layer normalization is most accurate. A quick elimination of partially true options helps confirm it.

Question 20

How is layer normalization best characterized?

Accepted Answer

Normalize across features per token.. Normalize across features per token. is the correct answer here. Stabilizes training. It fits the requirement in the prompt about how is layer normalization best characterized. A quick elimination of partially true options helps confirm it.

Question 21

Which option best describes a feed-forward network?

Accepted Answer

Per-token MLP between attention layers.. Here, Per-token MLP between attention layers. is the right choice. Adds expressivity. This is the most accurate statement for which option best describes a feed-forward network. A quick elimination of partially true options helps confirm it.

Question 22

What is the primary purpose of a feed-forward network?

Accepted Answer

Per-token MLP between attention layers.. In this case, Per-token MLP between attention layers. is correct. Adds expressivity. This is the most accurate statement for what is the primary purpose of a feed-forward. A quick elimination of partially true options helps confirm it.

Question 23

Which statement about a feed-forward network is most accurate?

Accepted Answer

Per-token MLP between attention layers.. The best option here is Per-token MLP between attention layers.. Adds expressivity. This is the most accurate statement for which statement about a feed-forward network is most. A quick elimination of partially true options helps confirm it.

Question 24

How is a feed-forward network best characterized?

Accepted Answer

Per-token MLP between attention layers.. For this question, Per-token MLP between attention layers. is correct. Adds expressivity. This is the most accurate statement for how is a feed-forward network best characterized. A quick elimination of partially true options helps confirm it.

Question 25

Which option best describes residual connections?

Accepted Answer

Skip connections around sublayers.. Skip connections around sublayers. is the correct answer here. Helps deep training. This is the most accurate statement for which option best describes residual connections. A quick elimination of partially true options helps confirm it.

Question 26

What is the primary purpose of residual connections?

Accepted Answer

Skip connections around sublayers.. Here, Skip connections around sublayers. is the right choice. Helps deep training. It aligns directly with what the question asks about what is the primary purpose of residual connections. The other options are either incomplete or contextually incorrect.

Question 27

Which statement about residual connections is most accurate?

Accepted Answer

Skip connections around sublayers.. In this case, Skip connections around sublayers. is correct. Helps deep training. It aligns directly with what the question asks about which statement about residual connections is most accurate. The other options are either incomplete or contextually incorrect.

Question 28

How is residual connections best characterized?

Accepted Answer

Skip connections around sublayers.. The best option here is Skip connections around sublayers.. Helps deep training. It aligns directly with what the question asks about how is residual connections best characterized. The other options are either incomplete or contextually incorrect.

Question 29

Which option best describes encoder-only models?

Accepted Answer

Models like BERT for understanding tasks.. For this question, Models like BERT for understanding tasks. is correct. Bidirectional attention. It aligns directly with what the question asks about which option best describes encoder-only models. The other options are either incomplete or contextually incorrect.

Question 30

What is the primary purpose of encoder-only models?

Accepted Answer

Models like BERT for understanding tasks.. Models like BERT for understanding tasks. is the correct answer here. Bidirectional attention. It aligns directly with what the question asks about what is the primary purpose of encoder-only models. The other options are either incomplete or contextually incorrect.

Question 31

Which statement about encoder-only models is most accurate?

Accepted Answer

Models like BERT for understanding tasks.. Here, Models like BERT for understanding tasks. is the right choice. Bidirectional attention. This matches the core idea being tested around which statement about encoder-only models is most accurate. The other options are either incomplete or contextually incorrect.

Question 32

How is encoder-only models best characterized?

Accepted Answer

Models like BERT for understanding tasks.. In this case, Models like BERT for understanding tasks. is correct. Bidirectional attention. This matches the core idea being tested around how is encoder-only models best characterized. The other options are either incomplete or contextually incorrect.

Question 33

Which option best describes decoder-only models?

Accepted Answer

Models like GPT for generative tasks.. The best option here is Models like GPT for generative tasks.. Causal attention. This matches the core idea being tested around which option best describes decoder-only models. The other options are either incomplete or contextually incorrect.

Question 34

What is the primary purpose of decoder-only models?

Accepted Answer

Models like GPT for generative tasks.. For this question, Models like GPT for generative tasks. is correct. Causal attention. This matches the core idea being tested around what is the primary purpose of decoder-only models. The other options are either incomplete or contextually incorrect.

Question 35

Which statement about decoder-only models is most accurate?

Accepted Answer

Models like GPT for generative tasks.. Models like GPT for generative tasks. is the correct answer here. Causal attention. This matches the core idea being tested around which statement about decoder-only models is most accurate. The other options are either incomplete or contextually incorrect.

Question 36

How is decoder-only models best characterized?

Accepted Answer

Models like GPT for generative tasks.. Here, Models like GPT for generative tasks. is the right choice. Causal attention. That is exactly the concept behind how is decoder-only models best characterized in this context. The other options are either incomplete or contextually incorrect.

Question 37

Which option best describes encoder-decoder models?

Accepted Answer

Seq2seq like T5, BART.. In this case, Seq2seq like T5, BART. is correct. Translation/summarization. That is exactly the concept behind which option best describes encoder-decoder models in this context. The other options are either incomplete or contextually incorrect.

Question 38

What is the primary purpose of encoder-decoder models?

Accepted Answer

Seq2seq like T5, BART.. The best option here is Seq2seq like T5, BART.. Translation/summarization. That is exactly the concept behind what is the primary purpose of encoder-decoder models in this context. The other options are either incomplete or contextually incorrect.

Question 39

Which statement about encoder-decoder models is most accurate?

Accepted Answer

Seq2seq like T5, BART.. For this question, Seq2seq like T5, BART. is correct. Translation/summarization. That is exactly the concept behind which statement about encoder-decoder models is most accurate in this context. The other options are either incomplete or contextually incorrect.

Question 40

How is encoder-decoder models best characterized?

Accepted Answer

Seq2seq like T5, BART.. Seq2seq like T5, BART. is the correct answer here. Translation/summarization. That is exactly the concept behind how is encoder-decoder models best characterized in this context. The other options are either incomplete or contextually incorrect.

Question 41

Which option best describes pretraining?

Accepted Answer

Train on large corpora with self-supervised objectives.. Here, Train on large corpora with self-supervised objectives. is the right choice. Builds general representations. It fits the requirement in the prompt about which option best describes pretraining. The other options are either incomplete or contextually incorrect.

Question 42

What is the primary purpose of pretraining?

Accepted Answer

Train on large corpora with self-supervised objectives.. In this case, Train on large corpora with self-supervised objectives. is correct. Builds general representations. It fits the requirement in the prompt about what is the primary purpose of pretraining. The other options are either incomplete or contextually incorrect.

Question 43

Which statement about pretraining is most accurate?

Accepted Answer

Train on large corpora with self-supervised objectives.. The best option here is Train on large corpora with self-supervised objectives.. Builds general representations. It fits the requirement in the prompt about which statement about pretraining is most accurate. The other options are either incomplete or contextually incorrect.

Question 44

How is pretraining best characterized?

Accepted Answer

Train on large corpora with self-supervised objectives.. For this question, Train on large corpora with self-supervised objectives. is correct. Builds general representations. It fits the requirement in the prompt about how is pretraining best characterized. The other options are either incomplete or contextually incorrect.

Question 45

Which option best describes fine-tuning?

Accepted Answer

Adapt pretrained model to a task.. Adapt pretrained model to a task. is the correct answer here. Smaller labeled data. It fits the requirement in the prompt about which option best describes fine-tuning. The other options are either incomplete or contextually incorrect.

Question 46

What is the primary purpose of fine-tuning?

Accepted Answer

Adapt pretrained model to a task.. Here, Adapt pretrained model to a task. is the right choice. Smaller labeled data. This is the most accurate statement for what is the primary purpose of fine-tuning. The other options are either incomplete or contextually incorrect.

Question 47

Which statement about fine-tuning is most accurate?

Accepted Answer

Adapt pretrained model to a task.. In this case, Adapt pretrained model to a task. is correct. Smaller labeled data. This is the most accurate statement for which statement about fine-tuning is most accurate. The other options are either incomplete or contextually incorrect.

Question 48

How is fine-tuning best characterized?

Accepted Answer

Adapt pretrained model to a task.. The best option here is Adapt pretrained model to a task.. Smaller labeled data. This is the most accurate statement for how is fine-tuning best characterized. The other options are either incomplete or contextually incorrect.

Question 49

Which option best describes RLHF?

Accepted Answer

Reinforcement learning from human feedback.. For this question, Reinforcement learning from human feedback. is correct. Aligns models to preferences. This is the most accurate statement for which option best describes rlhf. The other options are either incomplete or contextually incorrect.

Question 50

What is the primary purpose of RLHF?

Accepted Answer

Reinforcement learning from human feedback.. Reinforcement learning from human feedback. is the correct answer here. Aligns models to preferences. This is the most accurate statement for what is the primary purpose of rlhf. The other options are either incomplete or contextually incorrect.

AI Advanced MCQ Questions with Answers (Latest 2026)

Q1. Which option best describes transformer architecture?

Q2. What is the primary purpose of transformer architecture?

Q3. Which statement about transformer architecture is most accurate?

Q4. How is transformer architecture best characterized?

Q5. Which option best describes self-attention?

Q6. What is the primary purpose of self-attention?

Q7. Which statement about self-attention is most accurate?

Q8. How is self-attention best characterized?

Q9. Which option best describes multi-head attention?

Q10. What is the primary purpose of multi-head attention?

Q11. Which statement about multi-head attention is most accurate?

Q12. How is multi-head attention best characterized?

Q13. Which option best describes positional encoding?

Q14. What is the primary purpose of positional encoding?

Q15. Which statement about positional encoding is most accurate?

Q16. How is positional encoding best characterized?

Q17. Which option best describes layer normalization?

Q18. What is the primary purpose of layer normalization?

Q19. Which statement about layer normalization is most accurate?

Q20. How is layer normalization best characterized?

Q21. Which option best describes a feed-forward network?

Q22. What is the primary purpose of a feed-forward network?

Q23. Which statement about a feed-forward network is most accurate?

Q24. How is a feed-forward network best characterized?

Q25. Which option best describes residual connections?

Q26. What is the primary purpose of residual connections?

Q27. Which statement about residual connections is most accurate?

Q28. How is residual connections best characterized?

Q29. Which option best describes encoder-only models?

Q30. What is the primary purpose of encoder-only models?

Q31. Which statement about encoder-only models is most accurate?

Q32. How is encoder-only models best characterized?

Q33. Which option best describes decoder-only models?

Q34. What is the primary purpose of decoder-only models?

Q35. Which statement about decoder-only models is most accurate?

Q36. How is decoder-only models best characterized?

Q37. Which option best describes encoder-decoder models?

Q38. What is the primary purpose of encoder-decoder models?

Q39. Which statement about encoder-decoder models is most accurate?

Q40. How is encoder-decoder models best characterized?

Q41. Which option best describes pretraining?

Q42. What is the primary purpose of pretraining?

Q43. Which statement about pretraining is most accurate?

Q44. How is pretraining best characterized?

Q45. Which option best describes fine-tuning?

Q46. What is the primary purpose of fine-tuning?

Q47. Which statement about fine-tuning is most accurate?

Q48. How is fine-tuning best characterized?

Q49. Which option best describes RLHF?

Q50. What is the primary purpose of RLHF?