Question 1

class_weight='balanced' in classifiers helps by:

Accepted Answer

Adjusting class weights inversely to frequency. Here, Adjusting class weights inversely to frequency is the right choice. It gives minority classes more influence during training. It aligns directly with what the question asks about class_weight='balanced' in classifiers helps by:. Competing choices sound plausible, but they miss the key condition.

Question 2

Which is a common way to address class imbalance?

Accepted Answer

Resampling or class weighting. In this case, Resampling or class weighting is correct. Balancing methods improve minority detection. It aligns directly with what the question asks about which is a common way to address class. Competing choices sound plausible, but they miss the key condition.

Question 3

SMOTE is available in:

Accepted Answer

imbalanced-learn package. The best option here is imbalanced-learn package. SMOTE is provided by imbalanced-learn, often used with sklearn pipelines. It aligns directly with what the question asks about smote is available in:. Competing choices sound plausible, but they miss the key condition.

Question 4

PCA is primarily used for:

Accepted Answer

Dimensionality reduction. For this question, Dimensionality reduction is correct. PCA projects features onto lower-dimensional orthogonal components. It aligns directly with what the question asks about pca is primarily used for:. Competing choices sound plausible, but they miss the key condition.

Question 5

Which PCA attribute shows variance explained per component?

Accepted Answer

explained_variance_ratio_. explained_variance_ratio_ is the correct answer here. It indicates information retained by each principal component. It aligns directly with what the question asks about which pca attribute shows variance explained per component. Competing choices sound plausible, but they miss the key condition.

Question 6

Before PCA, scaling numeric features is usually:

Accepted Answer

Recommended. Here, Recommended is the right choice. PCA is scale-sensitive; large-scale features can dominate components. This matches the core idea being tested around before pca, scaling numeric features is usually:. Competing choices sound plausible, but they miss the key condition.

Question 7

SelectKBest is used for:

Accepted Answer

Feature selection. In this case, Feature selection is correct. It selects top k features by a scoring function. This matches the core idea being tested around selectkbest is used for:. Competing choices sound plausible, but they miss the key condition.

Question 8

mutual_info_classif estimates:

Accepted Answer

Dependency between features and class. The best option here is Dependency between features and class. Mutual information captures nonlinear associations too. This matches the core idea being tested around mutual_info_classif estimates:. Competing choices sound plausible, but they miss the key condition.

Question 9

VarianceThreshold removes features with:

Accepted Answer

Low variance below threshold. For this question, Low variance below threshold is correct. Near-constant features often contribute little predictive power. This matches the core idea being tested around variancethreshold removes features with:. Competing choices sound plausible, but they miss the key condition.

Question 10

Data leakage happens when:

Accepted Answer

Information from validation/test influences training. Information from validation/test influences training is the correct answer here. Leakage inflates metrics and harms real-world performance. This matches the core idea being tested around data leakage happens when:. Competing choices sound plausible, but they miss the key condition.

Question 11

Why fit preprocessors on training data only?

Accepted Answer

To avoid leakage. Here, To avoid leakage is the right choice. Validation/test statistics must not affect fitted transforms. That is exactly the concept behind why fit preprocessors on training data only in this context. Competing choices sound plausible, but they miss the key condition.

Question 12

Pipelines help prevent leakage because they:

Accepted Answer

Apply fit/transform within each training fold correctly. In this case, Apply fit/transform within each training fold correctly is correct. Pipeline integrates preprocessing with estimator during CV. That is exactly the concept behind pipelines help prevent leakage because they: in this context. Competing choices sound plausible, but they miss the key condition.

Question 13

SimpleImputer(strategy='mean') is used to:

Accepted Answer

Fill missing numeric values with mean. The best option here is Fill missing numeric values with mean. Mean imputation is a basic approach for continuous features. That is exactly the concept behind simpleimputer(strategy='mean') is used to: in this context. Competing choices sound plausible, but they miss the key condition.

Question 14

For skewed numeric data, which imputation may be more robust?

Accepted Answer

Median. For this question, Median is correct. Median is less affected by extreme outliers. That is exactly the concept behind for skewed numeric data, which imputation may be in this context. Competing choices sound plausible, but they miss the key condition.

Question 15

For categorical missing values, a common SimpleImputer strategy is:

Accepted Answer

most_frequent. most_frequent is the correct answer here. Most frequent category is often used for missing categoricals. That is exactly the concept behind for categorical missing values, a common simpleimputer strategy in this context. Competing choices sound plausible, but they miss the key condition.

Question 16

IterativeImputer in sklearn is:

Accepted Answer

An experimental multivariate imputer. Here, An experimental multivariate imputer is the right choice. It models each feature with missing values as function of others. It fits the requirement in the prompt about iterativeimputer in sklearn is:. Competing choices sound plausible, but they miss the key condition.

Question 17

Which scaler is generally robust to outliers?

Accepted Answer

RobustScaler. In this case, RobustScaler is correct. RobustScaler uses median and IQR, reducing outlier impact. It fits the requirement in the prompt about which scaler is generally robust to outliers. Competing choices sound plausible, but they miss the key condition.

Question 18

RobustScaler centers/scales using:

Accepted Answer

Median and IQR. The best option here is Median and IQR. This makes transformed values less sensitive to extremes. It fits the requirement in the prompt about robustscaler centers/scales using:. Competing choices sound plausible, but they miss the key condition.

Question 19

PolynomialFeatures helps linear models by:

Accepted Answer

Adding interaction/nonlinear terms. For this question, Adding interaction/nonlinear terms is correct. Expanded feature space can model nonlinear relationships. It fits the requirement in the prompt about polynomialfeatures helps linear models by:. Competing choices sound plausible, but they miss the key condition.

Question 20

A sign of overfitting is:

Accepted Answer

Low train error, high test error. Low train error, high test error is the correct answer here. The model memorizes training patterns but generalizes poorly. It fits the requirement in the prompt about a sign of overfitting is:. Competing choices sound plausible, but they miss the key condition.

Question 21

A sign of underfitting is:

Accepted Answer

High train and high test error. Here, High train and high test error is the right choice. Model capacity is too low to capture structure. This is the most accurate statement for a sign of underfitting is:. Competing choices sound plausible, but they miss the key condition.

Question 22

Regularization is mainly used to:

Accepted Answer

Reduce overfitting. In this case, Reduce overfitting is correct. Penalties constrain model complexity for better generalization. This is the most accurate statement for regularization is mainly used to:. Competing choices sound plausible, but they miss the key condition.

Question 23

Bias-variance tradeoff describes balance between:

Accepted Answer

Underfitting and overfitting. The best option here is Underfitting and overfitting. Higher bias underfits; higher variance overfits. This is the most accurate statement for bias-variance tradeoff describes balance between:. Competing choices sound plausible, but they miss the key condition.

Question 24

A learning curve plots performance against:

Accepted Answer

Number of training samples. For this question, Number of training samples is correct. It helps diagnose bias/variance and data sufficiency. This is the most accurate statement for a learning curve plots performance against:. Competing choices sound plausible, but they miss the key condition.

Question 25

A validation curve typically varies:

Accepted Answer

One hyperparameter value. One hyperparameter value is the correct answer here. It shows train/validation score trends across parameter settings. This is the most accurate statement for a validation curve typically varies:. Competing choices sound plausible, but they miss the key condition.

Question 26

Which estimator supports early stopping via incremental optimization options?

Accepted Answer

SGDClassifier. Here, SGDClassifier is the right choice. SGD-based estimators can use validation-based stopping controls. It aligns directly with what the question asks about which estimator supports early stopping via incremental optimization. The remaining choices fail because they don’t satisfy the full definition.

Question 27

For logistic behavior in SGDClassifier, commonly use loss=

Accepted Answer

log_loss. In this case, log_loss is correct. Log_loss corresponds to logistic regression objective. It aligns directly with what the question asks about for logistic behavior in sgdclassifier, commonly use loss=. The remaining choices fail because they don’t satisfy the full definition.

Question 28

Perceptron in sklearn is best described as:

Accepted Answer

Linear classifier trained with perceptron rule. The best option here is Linear classifier trained with perceptron rule. Perceptron is an online linear classification algorithm. It aligns directly with what the question asks about perceptron in sklearn is best described as:. The remaining choices fail because they don’t satisfy the full definition.

Question 29

PassiveAggressiveClassifier is useful for:

Accepted Answer

Online large-scale learning. For this question, Online large-scale learning is correct. It supports fast incremental updates on streaming data. It aligns directly with what the question asks about passiveaggressiveclassifier is useful for:. The remaining choices fail because they don’t satisfy the full definition.

Question 30

partial_fit enables:

Accepted Answer

Incremental learning on mini-batches. Incremental learning on mini-batches is the correct answer here. Partial_fit updates model without full retraining. It aligns directly with what the question asks about partial_fit enables:. The remaining choices fail because they don’t satisfy the full definition.

Question 31

Setting warm_start=True generally allows:

Accepted Answer

Reusing previous fitted state for further training. Here, Reusing previous fitted state for further training is the right choice. Warm_start can continue from existing model parameters. This matches the core idea being tested around setting warm_start=true generally allows:. The remaining choices fail because they don’t satisfy the full definition.

Question 32

sklearn.base.clone() is used to:

Accepted Answer

Create new estimator with same hyperparameters only. In this case, Create new estimator with same hyperparameters only is correct. Clone resets learned attributes while preserving init params. This matches the core idea being tested around sklearn.base.clone() is used to:. The remaining choices fail because they don’t satisfy the full definition.

Question 33

How do you set nested pipeline parameters?

Accepted Answer

Using step__param syntax. The best option here is Using step__param syntax. Example: model__C in a pipeline named model. This matches the core idea being tested around how do you set nested pipeline parameters. The remaining choices fail because they don’t satisfy the full definition.

Question 34

get_params(deep=True) returns:

Accepted Answer

Estimator parameters including nested ones. For this question, Estimator parameters including nested ones is correct. Deep mode includes sub-estimator parameters in composites. This matches the core idea being tested around get_params(deep=true) returns:. The remaining choices fail because they don’t satisfy the full definition.

Question 35

For model persistence in sklearn workflows, common tools are:

Accepted Answer

joblib or pickle. joblib or pickle is the correct answer here. Joblib/pickle can save and load fitted estimators. This matches the core idea being tested around for model persistence in sklearn workflows, common tools. The remaining choices fail because they don’t satisfy the full definition.

Question 36

Tree-based ensembles often expose global feature importance via:

Accepted Answer

feature_importances_. Here, feature_importances_ is the right choice. Impurity-based importances are available in many tree models. That is exactly the concept behind tree-based ensembles often expose global feature importance via: in this context. The remaining choices fail because they don’t satisfy the full definition.

Question 37

Permutation importance is useful because it is:

Accepted Answer

Model-agnostic. In this case, Model-agnostic is correct. It measures performance drop when a feature is shuffled. That is exactly the concept behind permutation importance is useful because it is: in this context. The remaining choices fail because they don’t satisfy the full definition.

Question 38

CalibratedClassifierCV is used to:

Accepted Answer

Calibrate predicted probabilities. The best option here is Calibrate predicted probabilities. Calibration aligns predicted probabilities with observed frequencies. That is exactly the concept behind calibratedclassifiercv is used to: in this context. The remaining choices fail because they don’t satisfy the full definition.

Question 39

Brier score evaluates:

Accepted Answer

Probability prediction quality. For this question, Probability prediction quality is correct. Lower Brier score indicates better calibrated probabilistic forecasts. That is exactly the concept behind brier score evaluates: in this context. The remaining choices fail because they don’t satisfy the full definition.

Question 40

To adapt single-output estimators for multiple targets, use:

Accepted Answer

MultiOutputRegressor / MultiOutputClassifier. MultiOutputRegressor / MultiOutputClassifier is the correct answer here. These wrappers train one estimator per target output. That is exactly the concept behind to adapt single-output estimators for multiple targets, use: in this context. The remaining choices fail because they don’t satisfy the full definition.

Question 41

OneVsRestClassifier strategy trains:

Accepted Answer

One binary model per class vs all others. Here, One binary model per class vs all others is the right choice. OvR decomposes multiclass into multiple binary problems. It fits the requirement in the prompt about onevsrestclassifier strategy trains:. The remaining choices fail because they don’t satisfy the full definition.

Question 42

OneVsOneClassifier strategy trains roughly:

Accepted Answer

k*(k-1)/2 pairwise models. In this case, k*(k-1)/2 pairwise models is correct. OvO fits a classifier for each pair of classes. It fits the requirement in the prompt about onevsoneclassifier strategy trains roughly:. The remaining choices fail because they don’t satisfy the full definition.

Question 43

For target labels in multiclass problems, a common utility is:

Accepted Answer

LabelBinarizer. The best option here is LabelBinarizer. LabelBinarizer converts class labels to binary indicator format. It fits the requirement in the prompt about for target labels in multiclass problems, a common. The remaining choices fail because they don’t satisfy the full definition.

Question 44

R² score in regression can be:

Accepted Answer

Negative on poor models. For this question, Negative on poor models is correct. R² below zero means model is worse than predicting mean target. It fits the requirement in the prompt about r² score in regression can be:. The remaining choices fail because they don’t satisfy the full definition.

Question 45

MAE compared to MSE is generally:

Accepted Answer

Less sensitive to outliers. Less sensitive to outliers is the correct answer here. Absolute error penalizes large deviations less aggressively than squared error. It fits the requirement in the prompt about mae compared to mse is generally:. The remaining choices fail because they don’t satisfy the full definition.

Question 46

RMSE emphasizes large errors because it:

Accepted Answer

Squares residuals before root. Here, Squares residuals before root is the right choice. Squaring magnifies larger errors relative to smaller ones. This is the most accurate statement for rmse emphasizes large errors because it:. The remaining choices fail because they don’t satisfy the full definition.

Question 47

Median absolute error is especially robust when:

Accepted Answer

Data has outliers. In this case, Data has outliers is correct. Median-based metrics are resistant to extreme values. This is the most accurate statement for median absolute error is especially robust when:. The remaining choices fail because they don’t satisfy the full definition.

Question 48

For time-ordered data, preferred cross-validation splitter is:

Accepted Answer

TimeSeriesSplit. The best option here is TimeSeriesSplit. It respects temporal order and avoids future-to-past leakage. This is the most accurate statement for for time-ordered data, preferred cross-validation splitter is:. The remaining choices fail because they don’t satisfy the full definition.

Question 49

When splitting time series data, shuffle should usually be:

Accepted Answer

False. For this question, False is correct. Shuffling can leak future information into training. This is the most accurate statement for when splitting time series data, shuffle should usually. The remaining choices fail because they don’t satisfy the full definition.

Question 50

What improves reproducibility besides random_state?

Accepted Answer

Pinning library versions and data snapshots. Pinning library versions and data snapshots is the correct answer here. Version and data control helps make experiments repeatable. This is the most accurate statement for what improves reproducibility besides random_state. The remaining choices fail because they don’t satisfy the full definition.

Classical Prediction Models with Scikit-learn MCQ Questions with Answers – Page 2 (Latest 2026)

Q51. class_weight='balanced' in classifiers helps by:

Q52. Which is a common way to address class imbalance?

Q53. SMOTE is available in:

Q54. PCA is primarily used for:

Q55. Which PCA attribute shows variance explained per component?

Q56. Before PCA, scaling numeric features is usually:

Q57. SelectKBest is used for:

Q58. mutual_info_classif estimates:

Q59. VarianceThreshold removes features with:

Q60. Data leakage happens when:

Q61. Why fit preprocessors on training data only?

Q62. Pipelines help prevent leakage because they:

Q63. SimpleImputer(strategy='mean') is used to:

Q64. For skewed numeric data, which imputation may be more robust?

Q65. For categorical missing values, a common SimpleImputer strategy is:

Q66. IterativeImputer in sklearn is:

Q67. Which scaler is generally robust to outliers?

Q68. RobustScaler centers/scales using:

Q69. PolynomialFeatures helps linear models by:

Q70. A sign of overfitting is:

Q71. A sign of underfitting is:

Q72. Regularization is mainly used to:

Q73. Bias-variance tradeoff describes balance between:

Q74. A learning curve plots performance against:

Q75. A validation curve typically varies:

Q76. Which estimator supports early stopping via incremental optimization options?

Q77. For logistic behavior in SGDClassifier, commonly use loss=

Q78. Perceptron in sklearn is best described as:

Q79. PassiveAggressiveClassifier is useful for:

Q80. partial_fit enables:

Q81. Setting warm_start=True generally allows:

Q82. sklearn.base.clone() is used to:

Q83. How do you set nested pipeline parameters?

Q84. get_params(deep=True) returns:

Q85. For model persistence in sklearn workflows, common tools are:

Q86. Tree-based ensembles often expose global feature importance via:

Q87. Permutation importance is useful because it is:

Q88. CalibratedClassifierCV is used to:

Q89. Brier score evaluates:

Q90. To adapt single-output estimators for multiple targets, use:

Q91. OneVsRestClassifier strategy trains:

Q92. OneVsOneClassifier strategy trains roughly:

Q93. For target labels in multiclass problems, a common utility is:

Q94. R² score in regression can be:

Q95. MAE compared to MSE is generally:

Q96. RMSE emphasizes large errors because it:

Q97. Median absolute error is especially robust when:

Q98. For time-ordered data, preferred cross-validation splitter is:

Q99. When splitting time series data, shuffle should usually be:

Q100. What improves reproducibility besides random_state?