Updated

Bayesian Optimization vs. Grid Search vs. Random Search: A Visual Guide

Bayesian optimization, grid search, and random search are three common ways to tune hyperparameters. They all answer the same question: which parameter values produce the best result? The difference is how they choose what to try next.

Short answer: use grid search when the search space is tiny and you need a simple baseline, random search when only a few parameters matter and you want a stronger baseline, and Bayesian optimization when each trial is expensive and you want the search to learn from previous results.

Quick comparison

MethodHow it searchesBest forMain weakness
Grid searchTests every point in a fixed gridTiny spaces, reproducible baselines, simple demosWastes trials on unimportant dimensions and grows exponentially
Random searchSamples points randomly from the spaceBroad early exploration and cheap-to-medium trialsDoes not intentionally focus on promising regions
Bayesian optimizationBuilds a model of the objective and chooses informative next trialsExpensive training, backtests, simulations, and black-box optimizationMore moving parts than a simple baseline

The visual idea

Imagine you are tuning two parameters: learning_rate and max_depth for a model, or lookback_window and risk_multiplier for a trading strategy. Each dot is one trial.

Grid search places trials on a fixed lattice:

max_depth
  ^
  |  x     x     x     x
  |
  |  x     x     x     x
  |
  |  x     x     x     x
  |
  |  x     x     x     x
  +------------------------> learning_rate

This is easy to understand and easy to reproduce. If you define four values for parameter A and four values for parameter B, grid search runs 16 trials.

The problem is scale. Add more parameters and the number of combinations grows quickly:

ParametersValues per parameterGrid trials
210100
41010,000
6101,000,000

Grid search also spends the same number of trials on every dimension, even when some parameters barely affect the metric.

Random search samples from the space without following a fixed grid:

max_depth
  ^
  |      x        x
  |  x
  |            x       x
  |      x
  | x              x
  |          x
  +------------------------> learning_rate

Random search often beats grid search when only a subset of parameters really matters. Instead of testing every combination of every dimension, it keeps drawing new combinations. That gives it more chances to find useful values for the important parameters.

For example, if learning_rate matters much more than max_depth, a grid can waste many trials repeating the same few learning_rate values. Random search can cover more distinct learning_rate values with the same budget.

Bayesian optimization

Bayesian optimization starts with exploration, then concentrates trials where the results look promising:

max_depth
  ^
  |  x       x
  |
  |             x  x
  |           x  *  x
  |             x  x
  |  x
  +------------------------> learning_rate

The * represents a promising region. Bayesian optimization does not know the best point in advance. It builds a probabilistic model from completed trials, estimates where good results may be, and chooses the next trial by balancing:

  • Exploration: try uncertain regions because they might contain better results.
  • Exploitation: try near known strong regions because they are likely to perform well.

That learning loop is why Bayesian optimization is useful for expensive black-box optimization: model training runs, trading backtests, simulations, batch jobs, and other workloads where every trial costs time or money.

How each method chooses the next trial

QuestionGrid searchRandom searchBayesian optimization
Does it use previous results?NoNoYes
Can it stop early with useful information?Sometimes, but inefficientlyYes, as a baselineYes, often with better direction
Does it handle continuous ranges naturally?Only after discretizingYesYes
Is it easy to parallelize?YesYesYes, with batch selection
Is it sample-efficient?LowMediumHigh

Use grid search when the search space is small enough that exhaustive testing is practical.

Good fit:

  • You have two or three parameters with a handful of values each.
  • You need a deterministic baseline.
  • You want to validate that your objective metric and trial runner work.
  • You are tuning categorical choices with few combinations.

Avoid grid search when the number of parameters grows, when ranges are continuous, or when each trial is expensive. In those cases, the grid becomes a trial budget problem instead of an optimization strategy.

Use random search when you want a simple, strong baseline across a large space.

Good fit:

  • You do not yet know which parameters matter.
  • Trials are cheap enough to run many samples.
  • You want broad coverage before using a smarter optimizer.
  • Your parameter space includes continuous ranges.

Random search is especially useful as a sanity check. If a more complex optimizer cannot beat a well-run random search baseline, the issue may be the objective, the search space, the trial budget, or noise in the measurements.

When to use Bayesian optimization

Use Bayesian optimization when you want better results from a limited trial budget.

Good fit:

  • Each trial is slow, expensive, or operationally limited.
  • You are optimizing a black-box function: you can run a trial and observe a metric, but you do not have gradients.
  • You care about finding strong configurations with fewer wasted runs.
  • You want the optimizer to adapt as results come in.

This is the typical case for hyperparameter optimization as a service. With HyperOptimizer, your workload runs in a Docker container for each trial. You define the search space and objective metric, the platform chooses parameter sets with optimization algorithms such as Bayesian optimization, collects metrics from stdout, and shows ranked results in the dashboard.

Practical example: tuning a trading strategy

Suppose a trading strategy has these parameters:

ParameterRange
lookback_window10 to 200
atr_multiplier0.5 to 5.0
stop_loss_bps5 to 50
max_position_size0.01 to 0.10

A grid with 20 values per parameter would require 160,000 backtests. If one backtest takes two minutes, that is more than 222 days of serial compute.

Random search can sample 200 or 1,000 combinations without enumerating everything. Bayesian optimization can go further by learning from early backtests and spending later trials on more promising regions, such as lower drawdown with acceptable return or a better Sharpe ratio under realistic cost assumptions.

Choosing the right method

Use this rule of thumb:

Your situationRecommended method
Tiny, discrete search spaceGrid search
Cheap trials and unknown parameter importanceRandom search
Expensive trials and limited budgetBayesian optimization
Need a baseline before a smarter optimizerRandom search
Need every listed combination tested exactlyGrid search
Need adaptive, sample-efficient searchBayesian optimization

Common mistakes

Mistake 1: making the grid too fine. A fine grid feels precise, but it can spend thousands of trials on values that do not matter.

Mistake 2: comparing methods with different budgets. If grid search gets 10,000 trials and Bayesian optimization gets 100, the comparison is not measuring the algorithm fairly.

Mistake 3: optimizing the wrong metric. Better search cannot fix an objective that rewards overfitting, ignores costs, or fails to penalize risk.

Mistake 4: skipping holdout validation. The best parameter set on one dataset may not generalize. Always validate strong configurations on data or scenarios not used for selection.

FAQ

No. Bayesian optimization is usually more sample-efficient when trials are expensive, but random search is simpler and can be very competitive when trials are cheap, noisy, or highly parallel.

Is grid search obsolete?

No. Grid search is still useful for small search spaces, deterministic baselines, and testing that an optimization workflow is wired correctly.

Random search can test more distinct values of important parameters. Grid search repeats fixed values across every dimension, which can waste trials when only a few parameters strongly affect the objective.

What is the best method for hyperparameter optimization?

For most expensive black-box workloads, Bayesian optimization is the best starting point after a random-search baseline. For tiny spaces, grid search is often enough.

Can these methods run in parallel?

Yes. Grid search and random search are naturally parallel. Bayesian optimization can also run parallel trials by selecting batches of promising and informative parameter sets.

Next steps

If your workload can accept parameters from the command line and print objective metrics, it can usually be optimized. Start with a small random-search or grid-search baseline, confirm your metric behaves correctly, then move to Bayesian optimization when trial budget matters.

HyperOptimizer is built for that workflow: package your model, backtest, simulation, or data pipeline as a Docker container, define the search space, and let managed optimization run the trials. Read the getting started guide or join the beta when you are ready to optimize without managing the infrastructure yourself.