AI Neural Networks Basics MCQ Questions with Answers – Page 2 (Latest 2026)

Practice AI Neural Networks Basics MCQ questions with detailed explanations and clear answer validation. These MCQs help you revise core concepts, compare close options, and improve accuracy for interviews, certification exams, and technical screening rounds. Use this updated 2026 set to strengthen fundamentals and confidence.

Related mcq: AI Advanced MCQ | AI Basics MCQ | AI Deep Learning Basics MCQ | Prediction Basics MCQ | Agentic AI Basics MCQ

Q51. Which statement about a local minimum is most accurate?

Select an answer to check.

Answer: Point lower than its neighborhood but not globally.

Here, Point lower than its neighborhood but not globally. is the right choice. Can trap optimizers. It aligns directly with what the question asks about which statement about a local minimum is most. Competing choices sound plausible, but they miss the key condition.

Q52. How is a local minimum best characterized?

Select an answer to check.

Answer: Point lower than its neighborhood but not globally.

In this case, Point lower than its neighborhood but not globally. is correct. Can trap optimizers. It aligns directly with what the question asks about how is a local minimum best characterized. Competing choices sound plausible, but they miss the key condition.

Q53. Which option best describes a saddle point?

Select an answer to check.

Answer: Has both ascending and descending directions.

The best option here is Has both ascending and descending directions.. Common in high-dimensional surfaces. It aligns directly with what the question asks about which option best describes a saddle point. Competing choices sound plausible, but they miss the key condition.

Q54. What is the primary purpose of a saddle point?

Select an answer to check.

Answer: Has both ascending and descending directions.

For this question, Has both ascending and descending directions. is correct. Common in high-dimensional surfaces. It aligns directly with what the question asks about what is the primary purpose of a saddle. Competing choices sound plausible, but they miss the key condition.

Q55. Which statement about a saddle point is most accurate?

Select an answer to check.

Answer: Has both ascending and descending directions.

Has both ascending and descending directions. is the correct answer here. Common in high-dimensional surfaces. It aligns directly with what the question asks about which statement about a saddle point is most. Competing choices sound plausible, but they miss the key condition.

Q56. How is a saddle point best characterized?

Select an answer to check.

Answer: Has both ascending and descending directions.

Here, Has both ascending and descending directions. is the right choice. Common in high-dimensional surfaces. This matches the core idea being tested around how is a saddle point best characterized. Competing choices sound plausible, but they miss the key condition.

Q57. Which option best describes weight initialization?

Select an answer to check.

Answer: Initial values for weights (Xavier, He, etc.).

In this case, Initial values for weights (Xavier, He, etc.). is correct. Affects training stability. This matches the core idea being tested around which option best describes weight initialization. Competing choices sound plausible, but they miss the key condition.

Q58. What is the primary purpose of weight initialization?

Select an answer to check.

Answer: Initial values for weights (Xavier, He, etc.).

The best option here is Initial values for weights (Xavier, He, etc.).. Affects training stability. This matches the core idea being tested around what is the primary purpose of weight initialization. Competing choices sound plausible, but they miss the key condition.

Q59. Which statement about weight initialization is most accurate?

Select an answer to check.

Answer: Initial values for weights (Xavier, He, etc.).

For this question, Initial values for weights (Xavier, He, etc.). is correct. Affects training stability. This matches the core idea being tested around which statement about weight initialization is most accurate. Competing choices sound plausible, but they miss the key condition.

Q60. How is weight initialization best characterized?

Select an answer to check.

Answer: Initial values for weights (Xavier, He, etc.).

Initial values for weights (Xavier, He, etc.). is the correct answer here. Affects training stability. This matches the core idea being tested around how is weight initialization best characterized. Competing choices sound plausible, but they miss the key condition.

Q61. Which option best describes Xavier initialization?

Select an answer to check.

Answer: Variance scaled to inputs+outputs.

Here, Variance scaled to inputs+outputs. is the right choice. Good for tanh/sigmoid. That is exactly the concept behind which option best describes xavier initialization in this context. Competing choices sound plausible, but they miss the key condition.

Q62. What is the primary purpose of Xavier initialization?

Select an answer to check.

Answer: Variance scaled to inputs+outputs.

In this case, Variance scaled to inputs+outputs. is correct. Good for tanh/sigmoid. That is exactly the concept behind what is the primary purpose of xavier initialization in this context. Competing choices sound plausible, but they miss the key condition.

Q63. Which statement about Xavier initialization is most accurate?

Select an answer to check.

Answer: Variance scaled to inputs+outputs.

The best option here is Variance scaled to inputs+outputs.. Good for tanh/sigmoid. That is exactly the concept behind which statement about xavier initialization is most accurate in this context. Competing choices sound plausible, but they miss the key condition.

Q64. How is Xavier initialization best characterized?

Select an answer to check.

Answer: Variance scaled to inputs+outputs.

For this question, Variance scaled to inputs+outputs. is correct. Good for tanh/sigmoid. That is exactly the concept behind how is xavier initialization best characterized in this context. Competing choices sound plausible, but they miss the key condition.

Q65. Which option best describes He initialization?

Select an answer to check.

Answer: Variance scaled for ReLU networks.

Variance scaled for ReLU networks. is the correct answer here. Recommended with ReLU. That is exactly the concept behind which option best describes he initialization in this context. Competing choices sound plausible, but they miss the key condition.

Q66. What is the primary purpose of He initialization?

Select an answer to check.

Answer: Variance scaled for ReLU networks.

Here, Variance scaled for ReLU networks. is the right choice. Recommended with ReLU. It fits the requirement in the prompt about what is the primary purpose of he initialization. Competing choices sound plausible, but they miss the key condition.

Q67. Which statement about He initialization is most accurate?

Select an answer to check.

Answer: Variance scaled for ReLU networks.

In this case, Variance scaled for ReLU networks. is correct. Recommended with ReLU. It fits the requirement in the prompt about which statement about he initialization is most accurate. Competing choices sound plausible, but they miss the key condition.

Q68. How is He initialization best characterized?

Select an answer to check.

Answer: Variance scaled for ReLU networks.

The best option here is Variance scaled for ReLU networks.. Recommended with ReLU. It fits the requirement in the prompt about how is he initialization best characterized. Competing choices sound plausible, but they miss the key condition.

Q69. Which option best describes batch size?

Select an answer to check.

Answer: Number of samples per gradient step.

For this question, Number of samples per gradient step. is correct. Tradeoffs: speed, memory, generalization. It fits the requirement in the prompt about which option best describes batch size. Competing choices sound plausible, but they miss the key condition.

Q70. What is the primary purpose of batch size?

Select an answer to check.

Answer: Number of samples per gradient step.

Number of samples per gradient step. is the correct answer here. Tradeoffs: speed, memory, generalization. It fits the requirement in the prompt about what is the primary purpose of batch size. Competing choices sound plausible, but they miss the key condition.

Q71. Which statement about batch size is most accurate?

Select an answer to check.

Answer: Number of samples per gradient step.

Here, Number of samples per gradient step. is the right choice. Tradeoffs: speed, memory, generalization. This is the most accurate statement for which statement about batch size is most accurate. Competing choices sound plausible, but they miss the key condition.

Q72. How is batch size best characterized?

Select an answer to check.

Answer: Number of samples per gradient step.

In this case, Number of samples per gradient step. is correct. Tradeoffs: speed, memory, generalization. This is the most accurate statement for how is batch size best characterized. Competing choices sound plausible, but they miss the key condition.

Q73. Which option best describes an autoencoder?

Select an answer to check.

Answer: Network that reconstructs its input via a bottleneck.

The best option here is Network that reconstructs its input via a bottleneck.. Useful for representation learning. This is the most accurate statement for which option best describes an autoencoder. Competing choices sound plausible, but they miss the key condition.

Q74. What is the primary purpose of an autoencoder?

Select an answer to check.

Answer: Network that reconstructs its input via a bottleneck.

For this question, Network that reconstructs its input via a bottleneck. is correct. Useful for representation learning. This is the most accurate statement for what is the primary purpose of an autoencoder. Competing choices sound plausible, but they miss the key condition.

Q75. Which statement about an autoencoder is most accurate?

Select an answer to check.

Answer: Network that reconstructs its input via a bottleneck.

Network that reconstructs its input via a bottleneck. is the correct answer here. Useful for representation learning. This is the most accurate statement for which statement about an autoencoder is most accurate. Competing choices sound plausible, but they miss the key condition.

Q76. How is an autoencoder best characterized?

Select an answer to check.

Answer: Network that reconstructs its input via a bottleneck.

Here, Network that reconstructs its input via a bottleneck. is the right choice. Useful for representation learning. It aligns directly with what the question asks about how is an autoencoder best characterized. The remaining choices fail because they don’t satisfy the full definition.

Q77. Which option best describes an embedding?

Select an answer to check.

Answer: A learned dense vector for a discrete entity.

In this case, A learned dense vector for a discrete entity. is correct. Captures similarity. It aligns directly with what the question asks about which option best describes an embedding. The remaining choices fail because they don’t satisfy the full definition.

Q78. What is the primary purpose of an embedding?

Select an answer to check.

Answer: A learned dense vector for a discrete entity.

The best option here is A learned dense vector for a discrete entity.. Captures similarity. It aligns directly with what the question asks about what is the primary purpose of an embedding. The remaining choices fail because they don’t satisfy the full definition.

Q79. Which statement about an embedding is most accurate?

Select an answer to check.

Answer: A learned dense vector for a discrete entity.

For this question, A learned dense vector for a discrete entity. is correct. Captures similarity. It aligns directly with what the question asks about which statement about an embedding is most accurate. The remaining choices fail because they don’t satisfy the full definition.

Q80. How is an embedding best characterized?

Select an answer to check.

Answer: A learned dense vector for a discrete entity.

A learned dense vector for a discrete entity. is the correct answer here. Captures similarity. It aligns directly with what the question asks about how is an embedding best characterized. The remaining choices fail because they don’t satisfy the full definition.

Q81. Which option best describes dropout?

Select an answer to check.

Answer: Randomly zero out activations during training.

Here, Randomly zero out activations during training. is the right choice. Regularizes the network. This matches the core idea being tested around which option best describes dropout. The remaining choices fail because they don’t satisfy the full definition.

Q82. What is the primary purpose of dropout?

Select an answer to check.

Answer: Randomly zero out activations during training.

In this case, Randomly zero out activations during training. is correct. Regularizes the network. This matches the core idea being tested around what is the primary purpose of dropout. The remaining choices fail because they don’t satisfy the full definition.

Q83. Which statement about dropout is most accurate?

Select an answer to check.

Answer: Randomly zero out activations during training.

The best option here is Randomly zero out activations during training.. Regularizes the network. This matches the core idea being tested around which statement about dropout is most accurate. The remaining choices fail because they don’t satisfy the full definition.

Q84. How is dropout best characterized?

Select an answer to check.

Answer: Randomly zero out activations during training.

For this question, Randomly zero out activations during training. is correct. Regularizes the network. This matches the core idea being tested around how is dropout best characterized. The remaining choices fail because they don’t satisfy the full definition.

Q85. Which option best describes L2 regularization?

Select an answer to check.

Answer: Add lambda * ||w||^2 penalty to loss.

Add lambda * ||w||^2 penalty to loss. is the correct answer here. Discourages large weights. This matches the core idea being tested around which option best describes l2 regularization. The remaining choices fail because they don’t satisfy the full definition.

Q86. What is the primary purpose of L2 regularization?

Select an answer to check.

Answer: Add lambda * ||w||^2 penalty to loss.

Here, Add lambda * ||w||^2 penalty to loss. is the right choice. Discourages large weights. That is exactly the concept behind what is the primary purpose of l2 regularization in this context. The remaining choices fail because they don’t satisfy the full definition.

Q87. Which statement about L2 regularization is most accurate?

Select an answer to check.

Answer: Add lambda * ||w||^2 penalty to loss.

In this case, Add lambda * ||w||^2 penalty to loss. is correct. Discourages large weights. That is exactly the concept behind which statement about l2 regularization is most accurate in this context. The remaining choices fail because they don’t satisfy the full definition.

Q88. How is L2 regularization best characterized?

Select an answer to check.

Answer: Add lambda * ||w||^2 penalty to loss.

The best option here is Add lambda * ||w||^2 penalty to loss.. Discourages large weights. That is exactly the concept behind how is l2 regularization best characterized in this context. The remaining choices fail because they don’t satisfy the full definition.

Q89. Which option best describes L1 regularization?

Select an answer to check.

Answer: Add lambda * ||w|| penalty; encourages sparsity.

For this question, Add lambda * ||w|| penalty; encourages sparsity. is correct. Sparser weights. That is exactly the concept behind which option best describes l1 regularization in this context. The remaining choices fail because they don’t satisfy the full definition.

Q90. What is the primary purpose of L1 regularization?

Select an answer to check.

Answer: Add lambda * ||w|| penalty; encourages sparsity.

Add lambda * ||w|| penalty; encourages sparsity. is the correct answer here. Sparser weights. That is exactly the concept behind what is the primary purpose of l1 regularization in this context. The remaining choices fail because they don’t satisfy the full definition.

Q91. Which statement about L1 regularization is most accurate?

Select an answer to check.

Answer: Add lambda * ||w|| penalty; encourages sparsity.

Here, Add lambda * ||w|| penalty; encourages sparsity. is the right choice. Sparser weights. It fits the requirement in the prompt about which statement about l1 regularization is most accurate. The remaining choices fail because they don’t satisfy the full definition.

Q92. How is L1 regularization best characterized?

Select an answer to check.

Answer: Add lambda * ||w|| penalty; encourages sparsity.

In this case, Add lambda * ||w|| penalty; encourages sparsity. is correct. Sparser weights. It fits the requirement in the prompt about how is l1 regularization best characterized. The remaining choices fail because they don’t satisfy the full definition.

Q93. Which option best describes activation gradient?

Select an answer to check.

Answer: Gradient of loss w.r.t. activation.

The best option here is Gradient of loss w.r.t. activation.. Used in backprop. It fits the requirement in the prompt about which option best describes activation gradient. The remaining choices fail because they don’t satisfy the full definition.

Q94. What is the primary purpose of activation gradient?

Select an answer to check.

Answer: Gradient of loss w.r.t. activation.

For this question, Gradient of loss w.r.t. activation. is correct. Used in backprop. It fits the requirement in the prompt about what is the primary purpose of activation gradient. The remaining choices fail because they don’t satisfy the full definition.

Q95. Which statement about activation gradient is most accurate?

Select an answer to check.

Answer: Gradient of loss w.r.t. activation.

Gradient of loss w.r.t. activation. is the correct answer here. Used in backprop. It fits the requirement in the prompt about which statement about activation gradient is most accurate. The remaining choices fail because they don’t satisfy the full definition.

Q96. How is activation gradient best characterized?

Select an answer to check.

Answer: Gradient of loss w.r.t. activation.

Here, Gradient of loss w.r.t. activation. is the right choice. Used in backprop. This is the most accurate statement for how is activation gradient best characterized. The remaining choices fail because they don’t satisfy the full definition.

Q97. Which option best describes gradient clipping?

Select an answer to check.

Answer: Cap gradient norm to avoid blow-ups.

In this case, Cap gradient norm to avoid blow-ups. is correct. Stabilizes RNN/Transformer training. This is the most accurate statement for which option best describes gradient clipping. The remaining choices fail because they don’t satisfy the full definition.

Q98. What is the primary purpose of gradient clipping?

Select an answer to check.

Answer: Cap gradient norm to avoid blow-ups.

The best option here is Cap gradient norm to avoid blow-ups.. Stabilizes RNN/Transformer training. This is the most accurate statement for what is the primary purpose of gradient clipping. The remaining choices fail because they don’t satisfy the full definition.

Q99. Which statement about gradient clipping is most accurate?

Select an answer to check.

Answer: Cap gradient norm to avoid blow-ups.

For this question, Cap gradient norm to avoid blow-ups. is correct. Stabilizes RNN/Transformer training. This is the most accurate statement for which statement about gradient clipping is most accurate. The remaining choices fail because they don’t satisfy the full definition.

Q100. How is gradient clipping best characterized?

Select an answer to check.

Answer: Cap gradient norm to avoid blow-ups.

Cap gradient norm to avoid blow-ups. is the correct answer here. Stabilizes RNN/Transformer training. This is the most accurate statement for how is gradient clipping best characterized. The remaining choices fail because they don’t satisfy the full definition.