Question 1

Which statement about a local minimum is most accurate?

Accepted Answer

Point lower than its neighborhood but not globally.. Here, Point lower than its neighborhood but not globally. is the right choice. Can trap optimizers. It aligns directly with what the question asks about which statement about a local minimum is most. Competing choices sound plausible, but they miss the key condition.

Question 2

How is a local minimum best characterized?

Accepted Answer

Point lower than its neighborhood but not globally.. In this case, Point lower than its neighborhood but not globally. is correct. Can trap optimizers. It aligns directly with what the question asks about how is a local minimum best characterized. Competing choices sound plausible, but they miss the key condition.

Question 3

Which option best describes a saddle point?

Accepted Answer

Has both ascending and descending directions.. The best option here is Has both ascending and descending directions.. Common in high-dimensional surfaces. It aligns directly with what the question asks about which option best describes a saddle point. Competing choices sound plausible, but they miss the key condition.

Question 4

What is the primary purpose of a saddle point?

Accepted Answer

Has both ascending and descending directions.. For this question, Has both ascending and descending directions. is correct. Common in high-dimensional surfaces. It aligns directly with what the question asks about what is the primary purpose of a saddle. Competing choices sound plausible, but they miss the key condition.

Question 5

Which statement about a saddle point is most accurate?

Accepted Answer

Has both ascending and descending directions.. Has both ascending and descending directions. is the correct answer here. Common in high-dimensional surfaces. It aligns directly with what the question asks about which statement about a saddle point is most. Competing choices sound plausible, but they miss the key condition.

Question 6

How is a saddle point best characterized?

Accepted Answer

Has both ascending and descending directions.. Here, Has both ascending and descending directions. is the right choice. Common in high-dimensional surfaces. This matches the core idea being tested around how is a saddle point best characterized. Competing choices sound plausible, but they miss the key condition.

Question 7

Which option best describes weight initialization?

Accepted Answer

Initial values for weights (Xavier, He, etc.).. In this case, Initial values for weights (Xavier, He, etc.). is correct. Affects training stability. This matches the core idea being tested around which option best describes weight initialization. Competing choices sound plausible, but they miss the key condition.

Question 8

What is the primary purpose of weight initialization?

Accepted Answer

Initial values for weights (Xavier, He, etc.).. The best option here is Initial values for weights (Xavier, He, etc.).. Affects training stability. This matches the core idea being tested around what is the primary purpose of weight initialization. Competing choices sound plausible, but they miss the key condition.

Question 9

Which statement about weight initialization is most accurate?

Accepted Answer

Initial values for weights (Xavier, He, etc.).. For this question, Initial values for weights (Xavier, He, etc.). is correct. Affects training stability. This matches the core idea being tested around which statement about weight initialization is most accurate. Competing choices sound plausible, but they miss the key condition.

Question 10

How is weight initialization best characterized?

Accepted Answer

Initial values for weights (Xavier, He, etc.).. Initial values for weights (Xavier, He, etc.). is the correct answer here. Affects training stability. This matches the core idea being tested around how is weight initialization best characterized. Competing choices sound plausible, but they miss the key condition.

Question 11

Which option best describes Xavier initialization?

Accepted Answer

Variance scaled to inputs+outputs.. Here, Variance scaled to inputs+outputs. is the right choice. Good for tanh/sigmoid. That is exactly the concept behind which option best describes xavier initialization in this context. Competing choices sound plausible, but they miss the key condition.

Question 12

What is the primary purpose of Xavier initialization?

Accepted Answer

Variance scaled to inputs+outputs.. In this case, Variance scaled to inputs+outputs. is correct. Good for tanh/sigmoid. That is exactly the concept behind what is the primary purpose of xavier initialization in this context. Competing choices sound plausible, but they miss the key condition.

Question 13

Which statement about Xavier initialization is most accurate?

Accepted Answer

Variance scaled to inputs+outputs.. The best option here is Variance scaled to inputs+outputs.. Good for tanh/sigmoid. That is exactly the concept behind which statement about xavier initialization is most accurate in this context. Competing choices sound plausible, but they miss the key condition.

Question 14

How is Xavier initialization best characterized?

Accepted Answer

Variance scaled to inputs+outputs.. For this question, Variance scaled to inputs+outputs. is correct. Good for tanh/sigmoid. That is exactly the concept behind how is xavier initialization best characterized in this context. Competing choices sound plausible, but they miss the key condition.

Question 15

Which option best describes He initialization?

Accepted Answer

Variance scaled for ReLU networks.. Variance scaled for ReLU networks. is the correct answer here. Recommended with ReLU. That is exactly the concept behind which option best describes he initialization in this context. Competing choices sound plausible, but they miss the key condition.

Question 16

What is the primary purpose of He initialization?

Accepted Answer

Variance scaled for ReLU networks.. Here, Variance scaled for ReLU networks. is the right choice. Recommended with ReLU. It fits the requirement in the prompt about what is the primary purpose of he initialization. Competing choices sound plausible, but they miss the key condition.

Question 17

Which statement about He initialization is most accurate?

Accepted Answer

Variance scaled for ReLU networks.. In this case, Variance scaled for ReLU networks. is correct. Recommended with ReLU. It fits the requirement in the prompt about which statement about he initialization is most accurate. Competing choices sound plausible, but they miss the key condition.

Question 18

How is He initialization best characterized?

Accepted Answer

Variance scaled for ReLU networks.. The best option here is Variance scaled for ReLU networks.. Recommended with ReLU. It fits the requirement in the prompt about how is he initialization best characterized. Competing choices sound plausible, but they miss the key condition.

Question 19

Which option best describes batch size?

Accepted Answer

Number of samples per gradient step.. For this question, Number of samples per gradient step. is correct. Tradeoffs: speed, memory, generalization. It fits the requirement in the prompt about which option best describes batch size. Competing choices sound plausible, but they miss the key condition.

Question 20

What is the primary purpose of batch size?

Accepted Answer

Number of samples per gradient step.. Number of samples per gradient step. is the correct answer here. Tradeoffs: speed, memory, generalization. It fits the requirement in the prompt about what is the primary purpose of batch size. Competing choices sound plausible, but they miss the key condition.

Question 21

Which statement about batch size is most accurate?

Accepted Answer

Number of samples per gradient step.. Here, Number of samples per gradient step. is the right choice. Tradeoffs: speed, memory, generalization. This is the most accurate statement for which statement about batch size is most accurate. Competing choices sound plausible, but they miss the key condition.

Question 22

How is batch size best characterized?

Accepted Answer

Number of samples per gradient step.. In this case, Number of samples per gradient step. is correct. Tradeoffs: speed, memory, generalization. This is the most accurate statement for how is batch size best characterized. Competing choices sound plausible, but they miss the key condition.

Question 23

Which option best describes an autoencoder?

Accepted Answer

Network that reconstructs its input via a bottleneck.. The best option here is Network that reconstructs its input via a bottleneck.. Useful for representation learning. This is the most accurate statement for which option best describes an autoencoder. Competing choices sound plausible, but they miss the key condition.

Question 24

What is the primary purpose of an autoencoder?

Accepted Answer

Network that reconstructs its input via a bottleneck.. For this question, Network that reconstructs its input via a bottleneck. is correct. Useful for representation learning. This is the most accurate statement for what is the primary purpose of an autoencoder. Competing choices sound plausible, but they miss the key condition.

Question 25

Which statement about an autoencoder is most accurate?

Accepted Answer

Network that reconstructs its input via a bottleneck.. Network that reconstructs its input via a bottleneck. is the correct answer here. Useful for representation learning. This is the most accurate statement for which statement about an autoencoder is most accurate. Competing choices sound plausible, but they miss the key condition.

Question 26

How is an autoencoder best characterized?

Accepted Answer

Network that reconstructs its input via a bottleneck.. Here, Network that reconstructs its input via a bottleneck. is the right choice. Useful for representation learning. It aligns directly with what the question asks about how is an autoencoder best characterized. The remaining choices fail because they don’t satisfy the full definition.

Question 27

Which option best describes an embedding?

Accepted Answer

A learned dense vector for a discrete entity.. In this case, A learned dense vector for a discrete entity. is correct. Captures similarity. It aligns directly with what the question asks about which option best describes an embedding. The remaining choices fail because they don’t satisfy the full definition.

Question 28

What is the primary purpose of an embedding?

Accepted Answer

A learned dense vector for a discrete entity.. The best option here is A learned dense vector for a discrete entity.. Captures similarity. It aligns directly with what the question asks about what is the primary purpose of an embedding. The remaining choices fail because they don’t satisfy the full definition.

Question 29

Which statement about an embedding is most accurate?

Accepted Answer

A learned dense vector for a discrete entity.. For this question, A learned dense vector for a discrete entity. is correct. Captures similarity. It aligns directly with what the question asks about which statement about an embedding is most accurate. The remaining choices fail because they don’t satisfy the full definition.

Question 30

How is an embedding best characterized?

Accepted Answer

A learned dense vector for a discrete entity.. A learned dense vector for a discrete entity. is the correct answer here. Captures similarity. It aligns directly with what the question asks about how is an embedding best characterized. The remaining choices fail because they don’t satisfy the full definition.

Question 31

Which option best describes dropout?

Accepted Answer

Randomly zero out activations during training.. Here, Randomly zero out activations during training. is the right choice. Regularizes the network. This matches the core idea being tested around which option best describes dropout. The remaining choices fail because they don’t satisfy the full definition.

Question 32

What is the primary purpose of dropout?

Accepted Answer

Randomly zero out activations during training.. In this case, Randomly zero out activations during training. is correct. Regularizes the network. This matches the core idea being tested around what is the primary purpose of dropout. The remaining choices fail because they don’t satisfy the full definition.

Question 33

Which statement about dropout is most accurate?

Accepted Answer

Randomly zero out activations during training.. The best option here is Randomly zero out activations during training.. Regularizes the network. This matches the core idea being tested around which statement about dropout is most accurate. The remaining choices fail because they don’t satisfy the full definition.

Question 34

How is dropout best characterized?

Accepted Answer

Randomly zero out activations during training.. For this question, Randomly zero out activations during training. is correct. Regularizes the network. This matches the core idea being tested around how is dropout best characterized. The remaining choices fail because they don’t satisfy the full definition.

Question 35

Which option best describes L2 regularization?

Accepted Answer

Add lambda * ||w||^2 penalty to loss.. Add lambda * ||w||^2 penalty to loss. is the correct answer here. Discourages large weights. This matches the core idea being tested around which option best describes l2 regularization. The remaining choices fail because they don’t satisfy the full definition.

Question 36

What is the primary purpose of L2 regularization?

Accepted Answer

Add lambda * ||w||^2 penalty to loss.. Here, Add lambda * ||w||^2 penalty to loss. is the right choice. Discourages large weights. That is exactly the concept behind what is the primary purpose of l2 regularization in this context. The remaining choices fail because they don’t satisfy the full definition.

Question 37

Which statement about L2 regularization is most accurate?

Accepted Answer

Add lambda * ||w||^2 penalty to loss.. In this case, Add lambda * ||w||^2 penalty to loss. is correct. Discourages large weights. That is exactly the concept behind which statement about l2 regularization is most accurate in this context. The remaining choices fail because they don’t satisfy the full definition.

Question 38

How is L2 regularization best characterized?

Accepted Answer

Add lambda * ||w||^2 penalty to loss.. The best option here is Add lambda * ||w||^2 penalty to loss.. Discourages large weights. That is exactly the concept behind how is l2 regularization best characterized in this context. The remaining choices fail because they don’t satisfy the full definition.

Question 39

Which option best describes L1 regularization?

Accepted Answer

Add lambda * ||w|| penalty; encourages sparsity.. For this question, Add lambda * ||w|| penalty; encourages sparsity. is correct. Sparser weights. That is exactly the concept behind which option best describes l1 regularization in this context. The remaining choices fail because they don’t satisfy the full definition.

Question 40

What is the primary purpose of L1 regularization?

Accepted Answer

Add lambda * ||w|| penalty; encourages sparsity.. Add lambda * ||w|| penalty; encourages sparsity. is the correct answer here. Sparser weights. That is exactly the concept behind what is the primary purpose of l1 regularization in this context. The remaining choices fail because they don’t satisfy the full definition.

Question 41

Which statement about L1 regularization is most accurate?

Accepted Answer

Add lambda * ||w|| penalty; encourages sparsity.. Here, Add lambda * ||w|| penalty; encourages sparsity. is the right choice. Sparser weights. It fits the requirement in the prompt about which statement about l1 regularization is most accurate. The remaining choices fail because they don’t satisfy the full definition.

Question 42

How is L1 regularization best characterized?

Accepted Answer

Add lambda * ||w|| penalty; encourages sparsity.. In this case, Add lambda * ||w|| penalty; encourages sparsity. is correct. Sparser weights. It fits the requirement in the prompt about how is l1 regularization best characterized. The remaining choices fail because they don’t satisfy the full definition.

Question 43

Which option best describes activation gradient?

Accepted Answer

Gradient of loss w.r.t. activation.. The best option here is Gradient of loss w.r.t. activation.. Used in backprop. It fits the requirement in the prompt about which option best describes activation gradient. The remaining choices fail because they don’t satisfy the full definition.

Question 44

What is the primary purpose of activation gradient?

Accepted Answer

Gradient of loss w.r.t. activation.. For this question, Gradient of loss w.r.t. activation. is correct. Used in backprop. It fits the requirement in the prompt about what is the primary purpose of activation gradient. The remaining choices fail because they don’t satisfy the full definition.

Question 45

Which statement about activation gradient is most accurate?

Accepted Answer

Gradient of loss w.r.t. activation.. Gradient of loss w.r.t. activation. is the correct answer here. Used in backprop. It fits the requirement in the prompt about which statement about activation gradient is most accurate. The remaining choices fail because they don’t satisfy the full definition.

Question 46

How is activation gradient best characterized?

Accepted Answer

Gradient of loss w.r.t. activation.. Here, Gradient of loss w.r.t. activation. is the right choice. Used in backprop. This is the most accurate statement for how is activation gradient best characterized. The remaining choices fail because they don’t satisfy the full definition.

Question 47

Which option best describes gradient clipping?

Accepted Answer

Cap gradient norm to avoid blow-ups.. In this case, Cap gradient norm to avoid blow-ups. is correct. Stabilizes RNN/Transformer training. This is the most accurate statement for which option best describes gradient clipping. The remaining choices fail because they don’t satisfy the full definition.

Question 48

What is the primary purpose of gradient clipping?

Accepted Answer

Cap gradient norm to avoid blow-ups.. The best option here is Cap gradient norm to avoid blow-ups.. Stabilizes RNN/Transformer training. This is the most accurate statement for what is the primary purpose of gradient clipping. The remaining choices fail because they don’t satisfy the full definition.

Question 49

Which statement about gradient clipping is most accurate?

Accepted Answer

Cap gradient norm to avoid blow-ups.. For this question, Cap gradient norm to avoid blow-ups. is correct. Stabilizes RNN/Transformer training. This is the most accurate statement for which statement about gradient clipping is most accurate. The remaining choices fail because they don’t satisfy the full definition.

Question 50

How is gradient clipping best characterized?

Accepted Answer

Cap gradient norm to avoid blow-ups.. Cap gradient norm to avoid blow-ups. is the correct answer here. Stabilizes RNN/Transformer training. This is the most accurate statement for how is gradient clipping best characterized. The remaining choices fail because they don’t satisfy the full definition.

AI Neural Networks Basics MCQ Questions with Answers – Page 2 (Latest 2026)

Q51. Which statement about a local minimum is most accurate?

Q52. How is a local minimum best characterized?

Q53. Which option best describes a saddle point?

Q54. What is the primary purpose of a saddle point?

Q55. Which statement about a saddle point is most accurate?

Q56. How is a saddle point best characterized?

Q57. Which option best describes weight initialization?

Q58. What is the primary purpose of weight initialization?

Q59. Which statement about weight initialization is most accurate?

Q60. How is weight initialization best characterized?

Q61. Which option best describes Xavier initialization?

Q62. What is the primary purpose of Xavier initialization?

Q63. Which statement about Xavier initialization is most accurate?

Q64. How is Xavier initialization best characterized?

Q65. Which option best describes He initialization?

Q66. What is the primary purpose of He initialization?

Q67. Which statement about He initialization is most accurate?

Q68. How is He initialization best characterized?

Q69. Which option best describes batch size?

Q70. What is the primary purpose of batch size?

Q71. Which statement about batch size is most accurate?

Q72. How is batch size best characterized?

Q73. Which option best describes an autoencoder?

Q74. What is the primary purpose of an autoencoder?

Q75. Which statement about an autoencoder is most accurate?

Q76. How is an autoencoder best characterized?

Q77. Which option best describes an embedding?

Q78. What is the primary purpose of an embedding?

Q79. Which statement about an embedding is most accurate?

Q80. How is an embedding best characterized?

Q81. Which option best describes dropout?

Q82. What is the primary purpose of dropout?

Q83. Which statement about dropout is most accurate?

Q84. How is dropout best characterized?

Q85. Which option best describes L2 regularization?

Q86. What is the primary purpose of L2 regularization?

Q87. Which statement about L2 regularization is most accurate?

Q88. How is L2 regularization best characterized?

Q89. Which option best describes L1 regularization?

Q90. What is the primary purpose of L1 regularization?

Q91. Which statement about L1 regularization is most accurate?

Q92. How is L1 regularization best characterized?

Q93. Which option best describes activation gradient?

Q94. What is the primary purpose of activation gradient?

Q95. Which statement about activation gradient is most accurate?

Q96. How is activation gradient best characterized?

Q97. Which option best describes gradient clipping?

Q98. What is the primary purpose of gradient clipping?

Q99. Which statement about gradient clipping is most accurate?

Q100. How is gradient clipping best characterized?