Question 1

Which statement about positional encoding is most accurate?

Accepted Answer

Encodes order info for attention models.. Here, Encodes order info for attention models. is the right choice. Required since attention is permutation-invariant. It aligns directly with what the question asks about which statement about positional encoding is most accurate. Competing choices sound plausible, but they miss the key condition.

Question 2

How is positional encoding best characterized?

Accepted Answer

Encodes order info for attention models.. In this case, Encodes order info for attention models. is correct. Required since attention is permutation-invariant. It aligns directly with what the question asks about how is positional encoding best characterized. Competing choices sound plausible, but they miss the key condition.

Question 3

Which option best describes layer normalization?

Accepted Answer

Normalizes activations across features per sample.. The best option here is Normalizes activations across features per sample.. Common in Transformers. It aligns directly with what the question asks about which option best describes layer normalization. Competing choices sound plausible, but they miss the key condition.

Question 4

What is the primary purpose of layer normalization?

Accepted Answer

Normalizes activations across features per sample.. For this question, Normalizes activations across features per sample. is correct. Common in Transformers. It aligns directly with what the question asks about what is the primary purpose of layer normalization. Competing choices sound plausible, but they miss the key condition.

Question 5

Which statement about layer normalization is most accurate?

Accepted Answer

Normalizes activations across features per sample.. Normalizes activations across features per sample. is the correct answer here. Common in Transformers. It aligns directly with what the question asks about which statement about layer normalization is most accurate. Competing choices sound plausible, but they miss the key condition.

Question 6

How is layer normalization best characterized?

Accepted Answer

Normalizes activations across features per sample.. Here, Normalizes activations across features per sample. is the right choice. Common in Transformers. This matches the core idea being tested around how is layer normalization best characterized. Competing choices sound plausible, but they miss the key condition.

Question 7

Which option best describes batch normalization?

Accepted Answer

Normalizes activations across the batch.. In this case, Normalizes activations across the batch. is correct. Helps optimization in CNNs. This matches the core idea being tested around which option best describes batch normalization. Competing choices sound plausible, but they miss the key condition.

Question 8

What is the primary purpose of batch normalization?

Accepted Answer

Normalizes activations across the batch.. The best option here is Normalizes activations across the batch.. Helps optimization in CNNs. This matches the core idea being tested around what is the primary purpose of batch normalization. Competing choices sound plausible, but they miss the key condition.

Question 9

Which statement about batch normalization is most accurate?

Accepted Answer

Normalizes activations across the batch.. For this question, Normalizes activations across the batch. is correct. Helps optimization in CNNs. This matches the core idea being tested around which statement about batch normalization is most accurate. Competing choices sound plausible, but they miss the key condition.

Question 10

How is batch normalization best characterized?

Accepted Answer

Normalizes activations across the batch.. Normalizes activations across the batch. is the correct answer here. Helps optimization in CNNs. This matches the core idea being tested around how is batch normalization best characterized. Competing choices sound plausible, but they miss the key condition.

Question 11

Which option best describes dropout?

Accepted Answer

Randomly zeroing activations to regularize.. Here, Randomly zeroing activations to regularize. is the right choice. Reduces co-adaptation of neurons. That is exactly the concept behind which option best describes dropout in this context. Competing choices sound plausible, but they miss the key condition.

Question 12

What is the primary purpose of dropout?

Accepted Answer

Randomly zeroing activations to regularize.. In this case, Randomly zeroing activations to regularize. is correct. Reduces co-adaptation of neurons. That is exactly the concept behind what is the primary purpose of dropout in this context. Competing choices sound plausible, but they miss the key condition.

Question 13

Which statement about dropout is most accurate?

Accepted Answer

Randomly zeroing activations to regularize.. The best option here is Randomly zeroing activations to regularize.. Reduces co-adaptation of neurons. That is exactly the concept behind which statement about dropout is most accurate in this context. Competing choices sound plausible, but they miss the key condition.

Question 14

How is dropout best characterized?

Accepted Answer

Randomly zeroing activations to regularize.. For this question, Randomly zeroing activations to regularize. is correct. Reduces co-adaptation of neurons. That is exactly the concept behind how is dropout best characterized in this context. Competing choices sound plausible, but they miss the key condition.

Question 15

Which option best describes a learning rate schedule?

Accepted Answer

Plan that adjusts LR during training.. Plan that adjusts LR during training. is the correct answer here. Examples: cosine, warmup. That is exactly the concept behind which option best describes a learning rate schedule in this context. Competing choices sound plausible, but they miss the key condition.

Question 16

What is the primary purpose of a learning rate schedule?

Accepted Answer

Plan that adjusts LR during training.. Here, Plan that adjusts LR during training. is the right choice. Examples: cosine, warmup. It fits the requirement in the prompt about what is the primary purpose of a learning. Competing choices sound plausible, but they miss the key condition.

Question 17

Which statement about a learning rate schedule is most accurate?

Accepted Answer

Plan that adjusts LR during training.. In this case, Plan that adjusts LR during training. is correct. Examples: cosine, warmup. It fits the requirement in the prompt about which statement about a learning rate schedule is. Competing choices sound plausible, but they miss the key condition.

Question 18

How is a learning rate schedule best characterized?

Accepted Answer

Plan that adjusts LR during training.. The best option here is Plan that adjusts LR during training.. Examples: cosine, warmup. It fits the requirement in the prompt about how is a learning rate schedule best characterized. Competing choices sound plausible, but they miss the key condition.

Question 19

Which option best describes Adam optimizer?

Accepted Answer

Adaptive optimizer combining momentum and RMSProp.. For this question, Adaptive optimizer combining momentum and RMSProp. is correct. Common default optimizer. It fits the requirement in the prompt about which option best describes adam optimizer. Competing choices sound plausible, but they miss the key condition.

Question 20

What is the primary purpose of Adam optimizer?

Accepted Answer

Adaptive optimizer combining momentum and RMSProp.. Adaptive optimizer combining momentum and RMSProp. is the correct answer here. Common default optimizer. It fits the requirement in the prompt about what is the primary purpose of adam optimizer. Competing choices sound plausible, but they miss the key condition.

Question 21

Which statement about Adam optimizer is most accurate?

Accepted Answer

Adaptive optimizer combining momentum and RMSProp.. Here, Adaptive optimizer combining momentum and RMSProp. is the right choice. Common default optimizer. This is the most accurate statement for which statement about adam optimizer is most accurate. Competing choices sound plausible, but they miss the key condition.

Question 22

How is Adam optimizer best characterized?

Accepted Answer

Adaptive optimizer combining momentum and RMSProp.. In this case, Adaptive optimizer combining momentum and RMSProp. is correct. Common default optimizer. This is the most accurate statement for how is adam optimizer best characterized. Competing choices sound plausible, but they miss the key condition.

Question 23

Which option best describes vanishing gradients?

Accepted Answer

Gradients shrink through deep nets, hurting training.. The best option here is Gradients shrink through deep nets, hurting training.. LSTMs/residual connections help. This is the most accurate statement for which option best describes vanishing gradients. Competing choices sound plausible, but they miss the key condition.

Question 24

What is the primary purpose of vanishing gradients?

Accepted Answer

Gradients shrink through deep nets, hurting training.. For this question, Gradients shrink through deep nets, hurting training. is correct. LSTMs/residual connections help. This is the most accurate statement for what is the primary purpose of vanishing gradients. Competing choices sound plausible, but they miss the key condition.

Question 25

Which statement about vanishing gradients is most accurate?

Accepted Answer

Gradients shrink through deep nets, hurting training.. Gradients shrink through deep nets, hurting training. is the correct answer here. LSTMs/residual connections help. This is the most accurate statement for which statement about vanishing gradients is most accurate. Competing choices sound plausible, but they miss the key condition.

Question 26

How is vanishing gradients best characterized?

Accepted Answer

Gradients shrink through deep nets, hurting training.. Here, Gradients shrink through deep nets, hurting training. is the right choice. LSTMs/residual connections help. It aligns directly with what the question asks about how is vanishing gradients best characterized. The remaining choices fail because they don’t satisfy the full definition.

Question 27

Which option best describes exploding gradients?

Accepted Answer

Gradients grow without bound during backprop.. In this case, Gradients grow without bound during backprop. is correct. Mitigated by clipping. It aligns directly with what the question asks about which option best describes exploding gradients. The remaining choices fail because they don’t satisfy the full definition.

Question 28

What is the primary purpose of exploding gradients?

Accepted Answer

Gradients grow without bound during backprop.. The best option here is Gradients grow without bound during backprop.. Mitigated by clipping. It aligns directly with what the question asks about what is the primary purpose of exploding gradients. The remaining choices fail because they don’t satisfy the full definition.

Question 29

Which statement about exploding gradients is most accurate?

Accepted Answer

Gradients grow without bound during backprop.. For this question, Gradients grow without bound during backprop. is correct. Mitigated by clipping. It aligns directly with what the question asks about which statement about exploding gradients is most accurate. The remaining choices fail because they don’t satisfy the full definition.

Question 30

How is exploding gradients best characterized?

Accepted Answer

Gradients grow without bound during backprop.. Gradients grow without bound during backprop. is the correct answer here. Mitigated by clipping. It aligns directly with what the question asks about how is exploding gradients best characterized. The remaining choices fail because they don’t satisfy the full definition.

Question 31

Which option best describes residual connections?

Accepted Answer

Skip connections that add input to output of a block.. Here, Skip connections that add input to output of a block. is the right choice. Enables deeper trainable networks. This matches the core idea being tested around which option best describes residual connections. The remaining choices fail because they don’t satisfy the full definition.

Question 32

What is the primary purpose of residual connections?

Accepted Answer

Skip connections that add input to output of a block.. In this case, Skip connections that add input to output of a block. is correct. Enables deeper trainable networks. This matches the core idea being tested around what is the primary purpose of residual connections. The remaining choices fail because they don’t satisfy the full definition.

Question 33

Which statement about residual connections is most accurate?

Accepted Answer

Skip connections that add input to output of a block.. The best option here is Skip connections that add input to output of a block.. Enables deeper trainable networks. This matches the core idea being tested around which statement about residual connections is most accurate. The remaining choices fail because they don’t satisfy the full definition.

Question 34

How is residual connections best characterized?

Accepted Answer

Skip connections that add input to output of a block.. For this question, Skip connections that add input to output of a block. is correct. Enables deeper trainable networks. This matches the core idea being tested around how is residual connections best characterized. The remaining choices fail because they don’t satisfy the full definition.

Question 35

Which option best describes an embedding layer?

Accepted Answer

Maps discrete tokens to dense vectors.. Maps discrete tokens to dense vectors. is the correct answer here. Foundational for NLP models. This matches the core idea being tested around which option best describes an embedding layer. The remaining choices fail because they don’t satisfy the full definition.

Question 36

What is the primary purpose of an embedding layer?

Accepted Answer

Maps discrete tokens to dense vectors.. Here, Maps discrete tokens to dense vectors. is the right choice. Foundational for NLP models. That is exactly the concept behind what is the primary purpose of an embedding in this context. The remaining choices fail because they don’t satisfy the full definition.

Question 37

Which statement about an embedding layer is most accurate?

Accepted Answer

Maps discrete tokens to dense vectors.. In this case, Maps discrete tokens to dense vectors. is correct. Foundational for NLP models. That is exactly the concept behind which statement about an embedding layer is most in this context. The remaining choices fail because they don’t satisfy the full definition.

Question 38

How is an embedding layer best characterized?

Accepted Answer

Maps discrete tokens to dense vectors.. The best option here is Maps discrete tokens to dense vectors.. Foundational for NLP models. That is exactly the concept behind how is an embedding layer best characterized in this context. The remaining choices fail because they don’t satisfy the full definition.

Question 39

Which option best describes a learning curve?

Accepted Answer

Plot of metric vs epochs/data size.. For this question, Plot of metric vs epochs/data size. is correct. Diagnoses training behavior. That is exactly the concept behind which option best describes a learning curve in this context. The remaining choices fail because they don’t satisfy the full definition.

Question 40

What is the primary purpose of a learning curve?

Accepted Answer

Plot of metric vs epochs/data size.. Plot of metric vs epochs/data size. is the correct answer here. Diagnoses training behavior. That is exactly the concept behind what is the primary purpose of a learning in this context. The remaining choices fail because they don’t satisfy the full definition.

Question 41

Which statement about a learning curve is most accurate?

Accepted Answer

Plot of metric vs epochs/data size.. Here, Plot of metric vs epochs/data size. is the right choice. Diagnoses training behavior. It fits the requirement in the prompt about which statement about a learning curve is most. The remaining choices fail because they don’t satisfy the full definition.

Question 42

How is a learning curve best characterized?

Accepted Answer

Plot of metric vs epochs/data size.. In this case, Plot of metric vs epochs/data size. is correct. Diagnoses training behavior. It fits the requirement in the prompt about how is a learning curve best characterized. The remaining choices fail because they don’t satisfy the full definition.

Question 43

Which option best describes early stopping?

Accepted Answer

Halt training when validation metric stops improving.. The best option here is Halt training when validation metric stops improving.. Common regularization technique. It fits the requirement in the prompt about which option best describes early stopping. The remaining choices fail because they don’t satisfy the full definition.

Question 44

What is the primary purpose of early stopping?

Accepted Answer

Halt training when validation metric stops improving.. For this question, Halt training when validation metric stops improving. is correct. Common regularization technique. It fits the requirement in the prompt about what is the primary purpose of early stopping. The remaining choices fail because they don’t satisfy the full definition.

Question 45

Which statement about early stopping is most accurate?

Accepted Answer

Halt training when validation metric stops improving.. Halt training when validation metric stops improving. is the correct answer here. Common regularization technique. It fits the requirement in the prompt about which statement about early stopping is most accurate. The remaining choices fail because they don’t satisfy the full definition.

Question 46

How is early stopping best characterized?

Accepted Answer

Halt training when validation metric stops improving.. Here, Halt training when validation metric stops improving. is the right choice. Common regularization technique. This is the most accurate statement for how is early stopping best characterized. The remaining choices fail because they don’t satisfy the full definition.

Question 47

Which option best describes transfer learning?

Accepted Answer

Reusing a pretrained model on a new task.. In this case, Reusing a pretrained model on a new task. is correct. Cuts data and compute needs. This is the most accurate statement for which option best describes transfer learning. The remaining choices fail because they don’t satisfy the full definition.

Question 48

What is the primary purpose of transfer learning?

Accepted Answer

Reusing a pretrained model on a new task.. The best option here is Reusing a pretrained model on a new task.. Cuts data and compute needs. This is the most accurate statement for what is the primary purpose of transfer learning. The remaining choices fail because they don’t satisfy the full definition.

Question 49

Which statement about transfer learning is most accurate?

Accepted Answer

Reusing a pretrained model on a new task.. For this question, Reusing a pretrained model on a new task. is correct. Cuts data and compute needs. This is the most accurate statement for which statement about transfer learning is most accurate. The remaining choices fail because they don’t satisfy the full definition.

Question 50

How is transfer learning best characterized?

Accepted Answer

Reusing a pretrained model on a new task.. Reusing a pretrained model on a new task. is the correct answer here. Cuts data and compute needs. This is the most accurate statement for how is transfer learning best characterized. The remaining choices fail because they don’t satisfy the full definition.

AI Deep Learning Basics MCQ Questions with Answers – Page 2 (Latest 2026)

Q51. Which statement about positional encoding is most accurate?

Q52. How is positional encoding best characterized?

Q53. Which option best describes layer normalization?

Q54. What is the primary purpose of layer normalization?

Q55. Which statement about layer normalization is most accurate?

Q56. How is layer normalization best characterized?

Q57. Which option best describes batch normalization?

Q58. What is the primary purpose of batch normalization?

Q59. Which statement about batch normalization is most accurate?

Q60. How is batch normalization best characterized?

Q61. Which option best describes dropout?

Q62. What is the primary purpose of dropout?

Q63. Which statement about dropout is most accurate?

Q64. How is dropout best characterized?

Q65. Which option best describes a learning rate schedule?

Q66. What is the primary purpose of a learning rate schedule?

Q67. Which statement about a learning rate schedule is most accurate?

Q68. How is a learning rate schedule best characterized?

Q69. Which option best describes Adam optimizer?

Q70. What is the primary purpose of Adam optimizer?

Q71. Which statement about Adam optimizer is most accurate?

Q72. How is Adam optimizer best characterized?

Q73. Which option best describes vanishing gradients?

Q74. What is the primary purpose of vanishing gradients?

Q75. Which statement about vanishing gradients is most accurate?

Q76. How is vanishing gradients best characterized?

Q77. Which option best describes exploding gradients?

Q78. What is the primary purpose of exploding gradients?

Q79. Which statement about exploding gradients is most accurate?

Q80. How is exploding gradients best characterized?

Q81. Which option best describes residual connections?

Q82. What is the primary purpose of residual connections?

Q83. Which statement about residual connections is most accurate?

Q84. How is residual connections best characterized?

Q85. Which option best describes an embedding layer?

Q86. What is the primary purpose of an embedding layer?

Q87. Which statement about an embedding layer is most accurate?

Q88. How is an embedding layer best characterized?

Q89. Which option best describes a learning curve?

Q90. What is the primary purpose of a learning curve?

Q91. Which statement about a learning curve is most accurate?

Q92. How is a learning curve best characterized?

Q93. Which option best describes early stopping?

Q94. What is the primary purpose of early stopping?

Q95. Which statement about early stopping is most accurate?

Q96. How is early stopping best characterized?

Q97. Which option best describes transfer learning?

Q98. What is the primary purpose of transfer learning?

Q99. Which statement about transfer learning is most accurate?

Q100. How is transfer learning best characterized?