📢 转载信息
原文链接:https://www.kdnuggets.com/3-hyperparameter-tuning-techniques-that-go-beyond-grid-search
原文作者:Iván Palomares Carrascosa
Image by Author
# 引言
在构建中等到高复杂度的机器学习模型时,存在大量的模型参数不是从数据中学到的,而是需要我们先验设定的:这些被称为超参数。随机森林集成和神经网络等模型有各种需要调整的超参数,其中每个参数都可以取许多不同的值。因此,即使是少量超参数的可能配置组合也几乎是无穷无尽的。这带来了这样一个问题:识别这些超参数的最佳配置——即产生最佳模型性能的配置——可能就像大海捞针,甚至更糟:像在海洋中找针。
本文建立在Machine Learning Mastery先前关于超参数调优艺术的指南基础上,并采用实践方法来说明如何应用中级到高级的超参数调优技术。
具体来说,您将学习如何应用以下三种超参数调优技术:
- 随机搜索 (randomized search)
- 贝叶斯优化 (bayesian optimization)
- 连续减半法 (successive halving)
# 执行初始设置
开始之前,我们将导入必要的库和依赖项——如果您遇到任何库的“Module not Found”错误,请确保首先使用 pip install 安装相应的库。我们将使用NumPy、scikit-learn和Optuna:
import numpy as np import time from sklearn.datasets import load_digits from sklearn.model_selection import train_test_split, cross_val_score from sklearn.ensemble import RandomForestClassifier import optuna import warnings warnings.filterwarnings('ignore')
我们还将加载用于这三个示例的数据集:修改后美国国家标准与技术研究院(MNIST),这是一个用于对手写数字进行低分辨率图像分类的数据集。
print("=" * 70)
print("LOADING MNIST DATASET FOR IMAGE CLASSIFICATION")
print("=" * 70)
# Load digits dataset (lightweight version of MNIST: 8x8 images, 1797 samples)
digits = load_digits()
X, y = digits.data, digits.target
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
X, y,
test_size=0.2,
random_state=42
)
print(f"Training instances: {X_train.shape[0]}")
print(f"Test instances: {X_test.shape[0]}")
print(f"Features: {X_train.shape[1]}")
print(f"Classes: {len(np.unique(y))}")
print()
接下来,我们定义一个超参数搜索空间;也就是说,我们确定要尝试组合哪些参数以及每个参数的哪些值子集。
print("=" * 70)
print("HYPERPARAMETER SEARCH SPACE")
print("=" * 70)
# Typical hyperparameters to explore in a random forest ensemble
param_space = {
'n_estimators': (10, 200), # Number of trees
'max_depth': (5, 50), # Maximum tree depth
'min_samples_split': (2, 20), # Min samples to split node
'min_samples_leaf': (1, 10), # Min samples in leaf node
'max_features': (0.1, 1.0) # Fraction of features to consider
}
print("Search space:")
for param, bounds in param_space.items():
print(f" {param}: {bounds}")
print()
作为最后的准备步骤,我们定义一个将被重复使用的函数。它封装了在特定超参数配置下使用交叉验证(CV)和分类准确率来确定模型质量的随机森林集成模型的训练和评估过程。请注意,此函数可能被我们实现的每种技术调用大量次数——多达需要尝试的超参数值组合次数。
def evaluate_model(params, X_train, y_train, cv=3):
# Instantiate a random forest model with given hyperparameters
model = RandomForestClassifier(
n_estimators=int(params['n_estimators']),
max_depth=int(params['max_depth']),
min_samples_split=int(params['min_samples_split']),
min_samples_leaf=int(params['min_samples_leaf']),
max_features=float(params['max_features']),
random_state=42,
n_jobs=-1 # Use all CPU cores for speed
)
# Use CV to measure performance
# This gives us a more robust estimate than a single train/val split
scores = cross_val_score(model, X_train, y_train, cv=cv, scoring='accuracy', n_jobs=-1)
# Return the average cross-validation accuracy
return np.mean(scores)
现在我们准备尝试这三种技术了!
# 实现随机搜索
顾名思义,随机搜索会从搜索空间中随机采样超参数组合,而不是像网格搜索那样详尽地尝试预定义搜索空间中的所有可能组合。每次试验都是独立的,从前几次试验中没有获取任何知识。尽管如此,在许多情况下,这是一种非常有效的方法,通常比网格搜索更快地找到高质量的解决方案。
以下是如何实现随机搜索并将其应用于随机森林集成模型来对 MNIST 数据进行分类的方法:
def randomized_search(n_trials=30):
start_time = time.time()
# Optional: used to measure execution time
results = []
print(f"\nRunning {n_trials} random trials...")
for i in range(n_trials):
# RANDOM SAMPLING: hyperparameters are sampled independently using numpy's random number generation
params = {
'n_estimators': np.random.randint(param_space['n_estimators'][0], param_space['n_estimators'][1]),
'max_depth': np.random.randint(param_space['max_depth'][0], param_space['max_depth'][1]),
'min_samples_split': np.random.randint(param_space['min_samples_split'][0], param_space['min_samples_split'][1]),
'min_samples_leaf': np.random.randint(param_space['min_samples_leaf'][0], param_space['min_samples_leaf'][1]),
'max_features': np.random.uniform(param_space['max_features'][0], param_space['max_features'][1])
}
# Evaluate a randomly defined configuration
score = evaluate_model(params, X_train, y_train)
results.append({'params': params, 'score': score})
# Provide a progress update every 10 trials, for informative purposes
if (i + 1) % 10 == 0:
best_so_far = max(results, key=lambda x: x['score'])
print(f" Trial {i+1}/{n_trials}: Best score so far = {best_so_far['score']:.4f}")
# Measure total time taken
elapsed_time = time.time() - start_time
# Identify best configuration found
best_result = max(results, key=lambda x: x['score'])
print(f"\n✓ Completed in {elapsed_time:.2f} seconds")
print(f"Best validation accuracy: {best_result['score']:.4f}")
print(f"Best parameters: {best_result['params']}")
return best_result, results
# Call the method to perform randomized search over 30 trials
random_best, random_results = randomized_search(n_trials=30)
代码旁提供了注释以帮助理解。我们获得的运行结果可能与以下类似:
Running 30 random trials...
Trial 10/30: Best score so far = 0.9617
Trial 20/30: Best score so far = 0.9617
Trial 30/30: Best score so far = 0.9617
✓ Completed in 64.59 seconds
Best validation accuracy: 0.9617
Best parameters: {'n_estimators': 195, 'max_depth': 16, 'min_samples_split': 8, 'min_samples_leaf': 2, 'max_features': 0.28306570555707966}
请注意运行超参数搜索过程所花费的时间,以及达到的最佳验证准确率。在这种情况下,10次试验似乎足以找到最佳配置。
# 应用贝叶斯优化
该方法使用一个辅助或代理模型——特别是基于高斯过程或基于树结构的概率模型——来预测性能最佳的超参数设置。试验不是独立的;每一次试验都会从前几次试验中“学习”。此外,该方法试图平衡探索(在解空间中尝试新区域)和利用(精炼有希望的区域)。总之,我们有了一种比网格搜索和随机搜索更智能的方法。
Optuna库为超参数调优提供了贝叶斯优化的特定实现,它使用参数化对偶估计器(Tree-structured Parzen Estimator, TPE)。它将试验分类为“好”或“坏”组,对每个组的模型化概率分布,并从有希望的区域进行采样。
整个过程可以实现如下:
def bayesian_optimization(n_trials=30):
""" Implementation of Bayesian optimization using Optuna library. """
start_time = time.time()
def objective(trial):
""" Optuna objective function: given a trial, returns a score. """
# Optuna can suggest values based on past performance
params = {
'n_estimators': trial.suggest_int('n_estimators', param_space['n_estimators'][0], param_space['n_estimators'][1]),
'max_depth': trial.suggest_int('max_depth', param_space['max_depth'][0], param_space['max_depth'][1]),
'min_samples_split': trial.suggest_int('min_samples_split', param_space['min_samples_split'][0], param_space['min_samples_split'][1]),
'min_samples_leaf': trial.suggest_int('min_samples_leaf', param_space['min_samples_leaf'][0], param_space['min_samples_leaf'][1]),
'max_features': trial.suggest_float('max_features', param_space['max_features'][0], param_space['max_features'][1])
}
# Evaluate and return score (maximizing by default in Optuna)
return evaluate_model(params, X_train, y_train)
# The create_study() function is used in Optuna to manage and run
# the overall optimization process
print(f"\nRunning {n_trials} Bayesian optimization trials...")
study = optuna.create_study(
direction='maximize', # We want to maximize accuracy
sampler=optuna.samplers.TPESampler(seed=42) # Bayesian algorithm
)
# Perform optimization process with progress callback
def callback(study, trial):
if trial.number % 10 == 9:
print(f" Trial {trial.number + 1}/{n_trials}: Best score = {study.best_value:.4f}")
study.optimize(objective, n_trials=n_trials, callbacks=[callback], show_progress_bar=False)
elapsed_time = time.time() - start_time
print(f"\n✓ Completed in {elapsed_time:.2f} seconds")
print(f"Best validation accuracy: {study.best_value:.4f}")
print(f"Best parameters: {study.best_params}")
return study.best_params, study.best_value, study
bayesian_best_params, bayesian_best_score, bayesian_study = bayesian_optimization(n_trials=30)
输出(摘要):
✓ Completed in 62.66 seconds
Best validation accuracy: 0.9673
Best parameters: {'n_estimators': 150, 'max_depth': 33, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_features': 0.19145126698170384}
# 利用连续减半法
三种方法中的最后一种,连续减半法(Successive Halving),平衡了搜索空间的大小与分配给每种可能配置的计算资源。它从大量的配置开始,但每个配置分配的资源有限(例如训练数据),然后逐渐淘汰表现不佳的配置,将更多资源分配给有希望的配置——类似于淘汰表现较弱选手的真实世界淘汰赛。
以下实现通过逐步修改训练集大小来指导连续减半法的应用。
def successive_halving(n_initial=32, min_resource=0.25, max_resource=1.0):
start_time = time.time()
# Step 1: Defining initial hyperparameter configurations at random
print(f"\nGenerating {n_initial} initial random configurations...")
configs = []
for _ in range(n_initial):
config = {
'n_estimators': np.random.randint(param_space['n_estimators'][0], param_space['n_estimators'][1]),
'max_depth': np.random.randint(param_space['max_depth'][0], param_space['max_depth'][1]),
'min_samples_split': np.random.randint(param_space['min_samples_split'][0], param_space['min_samples_split'][1]),
'min_samples_leaf': np.random.randint(param_space['min_samples_leaf'][0], param_space['min_samples_leaf'][1]),
'max_features': np.random.uniform(param_space['max_features'][0], param_space['max_features'][1])
}
configs.append(config)
# Step 2: apply tournament-like successive rounds of elimination
current_configs = configs
current_resource = min_resource
round_num = 1
while len(current_configs) > 1 and current_resource <= max_resource:
# Determine amount of training instances to use in the current round
n_samples = int(len(X_train) * current_resource)
print(f"\n--- Round {round_num}: Evaluating {len(current_configs)} configs ---")
print(f" Using {current_resource*100:.0f}% of training data ({n_samples} samples)")
# Subsample training instances
indices = np.random.choice(len(X_train), size=n_samples, replace=False)
X_subset = X_train[indices]
y_subset = y_train[indices]
# Evaluate all current configs with the current resources
scores = []
for i, config in enumerate(current_configs):
score = evaluate_model(config, X_subset, y_subset, cv=2) # Use cv=2 (minimum)
scores.append(score)
if (i + 1) % 10 == 0 or (i + 1) == len(current_configs):
print(f" Evaluated {i+1}/{len(current_configs)} configs...")
# Elimination policy: keep top-performing half only
n_keep = max(1, len(current_configs) // 2)
sorted_indices = np.argsort(scores)[::-1] # Descending order
current_configs = [current_configs[i] for i in sorted_indices[:n_keep]]
best_score = scores[sorted_indices[0]]
print(f" → Keeping top {n_keep} configs. Best score: {best_score:.4f}")
# Update resources, doubling them for the next round
current_resource = min(current_resource * 2, max_resource)
round_num += 1
# Final evaluation of best config found, given full training set
best_config = current_configs[0]
final_score = evaluate_model(best_config, X_train, y_train, cv=3)
elapsed_time = time.time() - start_time
print(f"\n✓ Completed in {elapsed_time:.2f} seconds")
print(f"Best validation accuracy: {final_score:.4f}")
print(f"Best parameters: {best_config}")
return best_config, final_score
halving_best, halving_score = successive_halving(n_initial=32, min_resource=0.25, max_resource=1.0)
最后获得的结果可能如下所示:
✓ Completed in 56.18 seconds
Best validation accuracy: 0.9645
Best parameters: {'n_estimators': 158, 'max_depth': 39, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': 0.2269785516325355}
# 比较最终结果
总而言之,所有三种方法都找到了验证准确率在 96% 到 97% 之间的最佳配置,其中贝叶斯优化以微弱优势获得了最佳结果。在效率方面,差异更为明显:连续减半法用时最短,仅超过 56 秒,而其他两种技术则花费了 62-64 秒。
Iván Palomares Carrascosa 是人工智能、机器学习、深度学习和大型语言模型领域的领导者、作家、演讲者和顾问。他负责培训和指导他人如何在现实世界中利用人工智能。
🚀 想要体验更好更全面的AI调用?
欢迎使用青云聚合API,约为官网价格的十分之一,支持300+全球最新模型,以及全球各种生图生视频模型,无需翻墙高速稳定,文档丰富,小白也可以简单操作。
评论区