AI Deep Learning Basics MCQ Questions with Answers – Page 2 (Latest 2026)
Practice AI Deep Learning Basics MCQ questions with detailed explanations and clear answer validation. These MCQs help you revise core concepts, compare close options, and improve accuracy for interviews, certification exams, and technical screening rounds. Use this updated 2026 set to strengthen fundamentals and confidence.
Q51. Which statement about positional encoding is most accurate?
Select an answer to check.
Answer: Encodes order info for attention models.
Here, Encodes order info for attention models. is the right choice. Required since attention is permutation-invariant. It aligns directly with what the question asks about which statement about positional encoding is most accurate. Competing choices sound plausible, but they miss the key condition.
Q52. How is positional encoding best characterized?
Select an answer to check.
Answer: Encodes order info for attention models.
In this case, Encodes order info for attention models. is correct. Required since attention is permutation-invariant. It aligns directly with what the question asks about how is positional encoding best characterized. Competing choices sound plausible, but they miss the key condition.
Q53. Which option best describes layer normalization?
Select an answer to check.
Answer: Normalizes activations across features per sample.
The best option here is Normalizes activations across features per sample.. Common in Transformers. It aligns directly with what the question asks about which option best describes layer normalization. Competing choices sound plausible, but they miss the key condition.
Q54. What is the primary purpose of layer normalization?
Select an answer to check.
Answer: Normalizes activations across features per sample.
For this question, Normalizes activations across features per sample. is correct. Common in Transformers. It aligns directly with what the question asks about what is the primary purpose of layer normalization. Competing choices sound plausible, but they miss the key condition.
Q55. Which statement about layer normalization is most accurate?
Select an answer to check.
Answer: Normalizes activations across features per sample.
Normalizes activations across features per sample. is the correct answer here. Common in Transformers. It aligns directly with what the question asks about which statement about layer normalization is most accurate. Competing choices sound plausible, but they miss the key condition.
Q56. How is layer normalization best characterized?
Select an answer to check.
Answer: Normalizes activations across features per sample.
Here, Normalizes activations across features per sample. is the right choice. Common in Transformers. This matches the core idea being tested around how is layer normalization best characterized. Competing choices sound plausible, but they miss the key condition.
Q57. Which option best describes batch normalization?
Select an answer to check.
Answer: Normalizes activations across the batch.
In this case, Normalizes activations across the batch. is correct. Helps optimization in CNNs. This matches the core idea being tested around which option best describes batch normalization. Competing choices sound plausible, but they miss the key condition.
Q58. What is the primary purpose of batch normalization?
Select an answer to check.
Answer: Normalizes activations across the batch.
The best option here is Normalizes activations across the batch.. Helps optimization in CNNs. This matches the core idea being tested around what is the primary purpose of batch normalization. Competing choices sound plausible, but they miss the key condition.
Q59. Which statement about batch normalization is most accurate?
Select an answer to check.
Answer: Normalizes activations across the batch.
For this question, Normalizes activations across the batch. is correct. Helps optimization in CNNs. This matches the core idea being tested around which statement about batch normalization is most accurate. Competing choices sound plausible, but they miss the key condition.
Q60. How is batch normalization best characterized?
Select an answer to check.
Answer: Normalizes activations across the batch.
Normalizes activations across the batch. is the correct answer here. Helps optimization in CNNs. This matches the core idea being tested around how is batch normalization best characterized. Competing choices sound plausible, but they miss the key condition.
Q61. Which option best describes dropout?
Select an answer to check.
Answer: Randomly zeroing activations to regularize.
Here, Randomly zeroing activations to regularize. is the right choice. Reduces co-adaptation of neurons. That is exactly the concept behind which option best describes dropout in this context. Competing choices sound plausible, but they miss the key condition.
Q62. What is the primary purpose of dropout?
Select an answer to check.
Answer: Randomly zeroing activations to regularize.
In this case, Randomly zeroing activations to regularize. is correct. Reduces co-adaptation of neurons. That is exactly the concept behind what is the primary purpose of dropout in this context. Competing choices sound plausible, but they miss the key condition.
Q63. Which statement about dropout is most accurate?
Select an answer to check.
Answer: Randomly zeroing activations to regularize.
The best option here is Randomly zeroing activations to regularize.. Reduces co-adaptation of neurons. That is exactly the concept behind which statement about dropout is most accurate in this context. Competing choices sound plausible, but they miss the key condition.
Q64. How is dropout best characterized?
Select an answer to check.
Answer: Randomly zeroing activations to regularize.
For this question, Randomly zeroing activations to regularize. is correct. Reduces co-adaptation of neurons. That is exactly the concept behind how is dropout best characterized in this context. Competing choices sound plausible, but they miss the key condition.
Q65. Which option best describes a learning rate schedule?
Select an answer to check.
Answer: Plan that adjusts LR during training.
Plan that adjusts LR during training. is the correct answer here. Examples: cosine, warmup. That is exactly the concept behind which option best describes a learning rate schedule in this context. Competing choices sound plausible, but they miss the key condition.
Q66. What is the primary purpose of a learning rate schedule?
Select an answer to check.
Answer: Plan that adjusts LR during training.
Here, Plan that adjusts LR during training. is the right choice. Examples: cosine, warmup. It fits the requirement in the prompt about what is the primary purpose of a learning. Competing choices sound plausible, but they miss the key condition.
Q67. Which statement about a learning rate schedule is most accurate?
Select an answer to check.
Answer: Plan that adjusts LR during training.
In this case, Plan that adjusts LR during training. is correct. Examples: cosine, warmup. It fits the requirement in the prompt about which statement about a learning rate schedule is. Competing choices sound plausible, but they miss the key condition.
Q68. How is a learning rate schedule best characterized?
Select an answer to check.
Answer: Plan that adjusts LR during training.
The best option here is Plan that adjusts LR during training.. Examples: cosine, warmup. It fits the requirement in the prompt about how is a learning rate schedule best characterized. Competing choices sound plausible, but they miss the key condition.
Q69. Which option best describes Adam optimizer?
Select an answer to check.
Answer: Adaptive optimizer combining momentum and RMSProp.
For this question, Adaptive optimizer combining momentum and RMSProp. is correct. Common default optimizer. It fits the requirement in the prompt about which option best describes adam optimizer. Competing choices sound plausible, but they miss the key condition.
Q70. What is the primary purpose of Adam optimizer?
Select an answer to check.
Answer: Adaptive optimizer combining momentum and RMSProp.
Adaptive optimizer combining momentum and RMSProp. is the correct answer here. Common default optimizer. It fits the requirement in the prompt about what is the primary purpose of adam optimizer. Competing choices sound plausible, but they miss the key condition.
Q71. Which statement about Adam optimizer is most accurate?
Select an answer to check.
Answer: Adaptive optimizer combining momentum and RMSProp.
Here, Adaptive optimizer combining momentum and RMSProp. is the right choice. Common default optimizer. This is the most accurate statement for which statement about adam optimizer is most accurate. Competing choices sound plausible, but they miss the key condition.
Q72. How is Adam optimizer best characterized?
Select an answer to check.
Answer: Adaptive optimizer combining momentum and RMSProp.
In this case, Adaptive optimizer combining momentum and RMSProp. is correct. Common default optimizer. This is the most accurate statement for how is adam optimizer best characterized. Competing choices sound plausible, but they miss the key condition.
Q73. Which option best describes vanishing gradients?
Select an answer to check.
Answer: Gradients shrink through deep nets, hurting training.
The best option here is Gradients shrink through deep nets, hurting training.. LSTMs/residual connections help. This is the most accurate statement for which option best describes vanishing gradients. Competing choices sound plausible, but they miss the key condition.
Q74. What is the primary purpose of vanishing gradients?
Select an answer to check.
Answer: Gradients shrink through deep nets, hurting training.
For this question, Gradients shrink through deep nets, hurting training. is correct. LSTMs/residual connections help. This is the most accurate statement for what is the primary purpose of vanishing gradients. Competing choices sound plausible, but they miss the key condition.
Q75. Which statement about vanishing gradients is most accurate?
Select an answer to check.
Answer: Gradients shrink through deep nets, hurting training.
Gradients shrink through deep nets, hurting training. is the correct answer here. LSTMs/residual connections help. This is the most accurate statement for which statement about vanishing gradients is most accurate. Competing choices sound plausible, but they miss the key condition.
Q76. How is vanishing gradients best characterized?
Select an answer to check.
Answer: Gradients shrink through deep nets, hurting training.
Here, Gradients shrink through deep nets, hurting training. is the right choice. LSTMs/residual connections help. It aligns directly with what the question asks about how is vanishing gradients best characterized. The remaining choices fail because they don’t satisfy the full definition.
Q77. Which option best describes exploding gradients?
Select an answer to check.
Answer: Gradients grow without bound during backprop.
In this case, Gradients grow without bound during backprop. is correct. Mitigated by clipping. It aligns directly with what the question asks about which option best describes exploding gradients. The remaining choices fail because they don’t satisfy the full definition.
Q78. What is the primary purpose of exploding gradients?
Select an answer to check.
Answer: Gradients grow without bound during backprop.
The best option here is Gradients grow without bound during backprop.. Mitigated by clipping. It aligns directly with what the question asks about what is the primary purpose of exploding gradients. The remaining choices fail because they don’t satisfy the full definition.
Q79. Which statement about exploding gradients is most accurate?
Select an answer to check.
Answer: Gradients grow without bound during backprop.
For this question, Gradients grow without bound during backprop. is correct. Mitigated by clipping. It aligns directly with what the question asks about which statement about exploding gradients is most accurate. The remaining choices fail because they don’t satisfy the full definition.
Q80. How is exploding gradients best characterized?
Select an answer to check.
Answer: Gradients grow without bound during backprop.
Gradients grow without bound during backprop. is the correct answer here. Mitigated by clipping. It aligns directly with what the question asks about how is exploding gradients best characterized. The remaining choices fail because they don’t satisfy the full definition.
Q81. Which option best describes residual connections?
Select an answer to check.
Answer: Skip connections that add input to output of a block.
Here, Skip connections that add input to output of a block. is the right choice. Enables deeper trainable networks. This matches the core idea being tested around which option best describes residual connections. The remaining choices fail because they don’t satisfy the full definition.
Q82. What is the primary purpose of residual connections?
Select an answer to check.
Answer: Skip connections that add input to output of a block.
In this case, Skip connections that add input to output of a block. is correct. Enables deeper trainable networks. This matches the core idea being tested around what is the primary purpose of residual connections. The remaining choices fail because they don’t satisfy the full definition.
Q83. Which statement about residual connections is most accurate?
Select an answer to check.
Answer: Skip connections that add input to output of a block.
The best option here is Skip connections that add input to output of a block.. Enables deeper trainable networks. This matches the core idea being tested around which statement about residual connections is most accurate. The remaining choices fail because they don’t satisfy the full definition.
Q84. How is residual connections best characterized?
Select an answer to check.
Answer: Skip connections that add input to output of a block.
For this question, Skip connections that add input to output of a block. is correct. Enables deeper trainable networks. This matches the core idea being tested around how is residual connections best characterized. The remaining choices fail because they don’t satisfy the full definition.
Q85. Which option best describes an embedding layer?
Select an answer to check.
Answer: Maps discrete tokens to dense vectors.
Maps discrete tokens to dense vectors. is the correct answer here. Foundational for NLP models. This matches the core idea being tested around which option best describes an embedding layer. The remaining choices fail because they don’t satisfy the full definition.
Q86. What is the primary purpose of an embedding layer?
Select an answer to check.
Answer: Maps discrete tokens to dense vectors.
Here, Maps discrete tokens to dense vectors. is the right choice. Foundational for NLP models. That is exactly the concept behind what is the primary purpose of an embedding in this context. The remaining choices fail because they don’t satisfy the full definition.
Q87. Which statement about an embedding layer is most accurate?
Select an answer to check.
Answer: Maps discrete tokens to dense vectors.
In this case, Maps discrete tokens to dense vectors. is correct. Foundational for NLP models. That is exactly the concept behind which statement about an embedding layer is most in this context. The remaining choices fail because they don’t satisfy the full definition.
Q88. How is an embedding layer best characterized?
Select an answer to check.
Answer: Maps discrete tokens to dense vectors.
The best option here is Maps discrete tokens to dense vectors.. Foundational for NLP models. That is exactly the concept behind how is an embedding layer best characterized in this context. The remaining choices fail because they don’t satisfy the full definition.
Q89. Which option best describes a learning curve?
Select an answer to check.
Answer: Plot of metric vs epochs/data size.
For this question, Plot of metric vs epochs/data size. is correct. Diagnoses training behavior. That is exactly the concept behind which option best describes a learning curve in this context. The remaining choices fail because they don’t satisfy the full definition.
Q90. What is the primary purpose of a learning curve?
Select an answer to check.
Answer: Plot of metric vs epochs/data size.
Plot of metric vs epochs/data size. is the correct answer here. Diagnoses training behavior. That is exactly the concept behind what is the primary purpose of a learning in this context. The remaining choices fail because they don’t satisfy the full definition.
Q91. Which statement about a learning curve is most accurate?
Select an answer to check.
Answer: Plot of metric vs epochs/data size.
Here, Plot of metric vs epochs/data size. is the right choice. Diagnoses training behavior. It fits the requirement in the prompt about which statement about a learning curve is most. The remaining choices fail because they don’t satisfy the full definition.
Q92. How is a learning curve best characterized?
Select an answer to check.
Answer: Plot of metric vs epochs/data size.
In this case, Plot of metric vs epochs/data size. is correct. Diagnoses training behavior. It fits the requirement in the prompt about how is a learning curve best characterized. The remaining choices fail because they don’t satisfy the full definition.
Q93. Which option best describes early stopping?
Select an answer to check.
Answer: Halt training when validation metric stops improving.
The best option here is Halt training when validation metric stops improving.. Common regularization technique. It fits the requirement in the prompt about which option best describes early stopping. The remaining choices fail because they don’t satisfy the full definition.
Q94. What is the primary purpose of early stopping?
Select an answer to check.
Answer: Halt training when validation metric stops improving.
For this question, Halt training when validation metric stops improving. is correct. Common regularization technique. It fits the requirement in the prompt about what is the primary purpose of early stopping. The remaining choices fail because they don’t satisfy the full definition.
Q95. Which statement about early stopping is most accurate?
Select an answer to check.
Answer: Halt training when validation metric stops improving.
Halt training when validation metric stops improving. is the correct answer here. Common regularization technique. It fits the requirement in the prompt about which statement about early stopping is most accurate. The remaining choices fail because they don’t satisfy the full definition.
Q96. How is early stopping best characterized?
Select an answer to check.
Answer: Halt training when validation metric stops improving.
Here, Halt training when validation metric stops improving. is the right choice. Common regularization technique. This is the most accurate statement for how is early stopping best characterized. The remaining choices fail because they don’t satisfy the full definition.
Q97. Which option best describes transfer learning?
Select an answer to check.
Answer: Reusing a pretrained model on a new task.
In this case, Reusing a pretrained model on a new task. is correct. Cuts data and compute needs. This is the most accurate statement for which option best describes transfer learning. The remaining choices fail because they don’t satisfy the full definition.
Q98. What is the primary purpose of transfer learning?
Select an answer to check.
Answer: Reusing a pretrained model on a new task.
The best option here is Reusing a pretrained model on a new task.. Cuts data and compute needs. This is the most accurate statement for what is the primary purpose of transfer learning. The remaining choices fail because they don’t satisfy the full definition.
Q99. Which statement about transfer learning is most accurate?
Select an answer to check.
Answer: Reusing a pretrained model on a new task.
For this question, Reusing a pretrained model on a new task. is correct. Cuts data and compute needs. This is the most accurate statement for which statement about transfer learning is most accurate. The remaining choices fail because they don’t satisfy the full definition.
Q100. How is transfer learning best characterized?
Select an answer to check.
Answer: Reusing a pretrained model on a new task.
Reusing a pretrained model on a new task. is the correct answer here. Cuts data and compute needs. This is the most accurate statement for how is transfer learning best characterized. The remaining choices fail because they don’t satisfy the full definition.