Ziraddin Gulumjanli

Hyperparameter tuning is a crucial step in machine learning that involves selecting the optimal configuration of parameters that control the behavior of a model. Unlike model parameters, which are learned from data during training, hyperparameters are set before the learning process and can significantly impact performance. Common examples include the depth of a decision tree, the number of estimators in a random forest, the regularization strength in logistic regression, or the kernel type in an SVM. The goal of hyperparameter tuning is to identify the combination of values that maximizes model performance, typically evaluated using a chosen metric such as accuracy, F1-score, or AUC. Techniques like grid search exhaustively explore all possible combinations of specified hyperparameters, performing cross-validation to ensure that the selected values generalize well to unseen data. Randomized search offers a more computationally efficient alternative by sampling a subset of parameter combinations, which can be particularly useful for large search spaces.

1. Why Grid Search?

When training machine learning models, hyperparameters control model behavior (e.g., how deep a tree grows, or how strong regularization is). Choosing them arbitrarily can lead to poor performance. GridSearchCV in scikit-learn automates this by:

Trying all combinations of specified hyperparameters.
Performing cross-validation to estimate performance on unseen data.
Returning the best parameter set according to a scoring metric.

This ensures your model is tuned efficiently without guesswork.

2. Basic Grid Search Structure

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier

# 1. Define the model
rf = RandomForestClassifier(random_state=42)

# 2. Define hyperparameter grid
param_grid = {
    'n_estimators': [100, 200, 300],       # number of trees
    'max_depth': [None, 5, 10],            # maximum depth of tree
    'min_samples_split': [2, 5, 10],       # minimum samples to split a node
    'min_samples_leaf': [1, 2, 4],         # minimum samples per leaf
    'max_features': ['sqrt', 'log2', None] # features considered per split
}

# 3. Initialize GridSearchCV
grid_search = GridSearchCV(
    estimator=rf,
    param_grid=param_grid,
    scoring='accuracy',  # metric to optimize
    cv=5,                # 5-fold cross-validation
    n_jobs=-1,           # use all CPU cores
    verbose=2,           # print progress messages
    return_train_score=True
)

# 4. Fit to training data
grid_search.fit(X_train, y_train)

# 5. Best hyperparameters
print(grid_search.best_params_)
print(grid_search.best_score_)

3. Explanation of Each Part

estimator=rf

The ML model we want to train (here, RandomForestClassifier).

param_grid=param_grid

A dictionary specifying all hyperparameters to explore. Each key is a parameter, and the value is a list of options.

scoring='accuracy'

The metric used to evaluate performance. Other options: 'precision', 'recall', 'f1', 'roc_auc', or a custom scoring function.

cv=5

Number of folds in cross-validation. Each fold trains on 80% and validates on 20% (5 times).

n_jobs=-1

Parallelization option.
-1 → use all available CPU cores
1 → single core
2, 3, ... → use that many cores

verbose=2

Controls logging details during search.
0 = silent, 1 = minimal, 2 = detailed.

return_train_score=True

Keeps training set scores in the results, useful to check overfitting.

4. Algorithm-Specific Hyperparameters

Different algorithms use different sets of hyperparameters:

Algorithm	Common Hyperparameters
Decision Tree	`max_depth`, `min_samples_split`, `min_samples_leaf`, `max_features`
Random Forest	All DT params + `n_estimators`, `bootstrap`
XGBoost / Gradient Boosting	`learning_rate`, `n_estimators`, `max_depth`, `subsample`, `colsample_bytree`
Logistic Regression	`C` (inverse regularization), `penalty` (`l1`, `l2`, `elasticnet`), `solver`
SVM / SVR	`C` (regularization), `kernel` (`linear`, `rbf`, `poly`), `gamma`, `degree` (for poly), `coef0`
KNN	`n_neighbors`, `weights` (`uniform`, `distance`), `metric` (`euclidean`, `manhattan`)
Neural Networks (MLP)	`hidden_layer_sizes`, `activation`, `solver`, `alpha` (L2), `learning_rate`

5. Tips & Common Mistakes

Large grids → long runtime: Use RandomizedSearchCV for sampling fewer combinations.
Scaling features: SVM, Logistic Regression, KNN require feature scaling; trees do not.
Class imbalance: Use class_weight='balanced' for models that support it.
Use proper scoring: In imbalanced datasets, optimizing 'accuracy' might be misleading; prefer 'recall' or 'f1'.
Seed for reproducibility: Always set random_state for consistent results.

6. Randomized Search (Optional Shortcut)

from sklearn.model_selection import RandomizedSearchCV

random_search = RandomizedSearchCV(
    estimator=rf,
    param_distributions=param_grid,
    n_iter=20,          # only 20 random combinations
    scoring='accuracy',
    cv=5,
    n_jobs=-1,
    verbose=2,
    random_state=42
)

Faster than grid search for very large grids.
n_iter controls how many random parameter sets are tested.

See Scikit Learn Material on Hyperparameter tuning