Updated
Bayesian Optimization vs. Grid Search vs. Random Search: A Visual Guide
Bayesian optimization, grid search, and random search are three common ways to tune hyperparameters. They all answer the same question: which parameter values produce the best result? The difference is how they choose what to try next.
Short answer: use grid search when the search space is tiny and you need a simple baseline, random search when only a few parameters matter and you want a stronger baseline, and Bayesian optimization when each trial is expensive and you want the search to learn from previous results.
Quick comparison
| Method | How it searches | Best for | Main weakness |
|---|---|---|---|
| Grid search | Tests every point in a fixed grid | Tiny spaces, reproducible baselines, simple demos | Wastes trials on unimportant dimensions and grows exponentially |
| Random search | Samples points randomly from the space | Broad early exploration and cheap-to-medium trials | Does not intentionally focus on promising regions |
| Bayesian optimization | Builds a model of the objective and chooses informative next trials | Expensive training, backtests, simulations, and black-box optimization | More moving parts than a simple baseline |
The visual idea
Imagine you are tuning two parameters: learning_rate and max_depth for a model, or lookback_window and risk_multiplier for a trading strategy. Each dot is one trial.
Grid search
Grid search places trials on a fixed lattice:
max_depth
^
| x x x x
|
| x x x x
|
| x x x x
|
| x x x x
+------------------------> learning_rate
This is easy to understand and easy to reproduce. If you define four values for parameter A and four values for parameter B, grid search runs 16 trials.
The problem is scale. Add more parameters and the number of combinations grows quickly:
| Parameters | Values per parameter | Grid trials |
|---|---|---|
| 2 | 10 | 100 |
| 4 | 10 | 10,000 |
| 6 | 10 | 1,000,000 |
Grid search also spends the same number of trials on every dimension, even when some parameters barely affect the metric.
Random search
Random search samples from the space without following a fixed grid:
max_depth
^
| x x
| x
| x x
| x
| x x
| x
+------------------------> learning_rate
Random search often beats grid search when only a subset of parameters really matters. Instead of testing every combination of every dimension, it keeps drawing new combinations. That gives it more chances to find useful values for the important parameters.
For example, if learning_rate matters much more than max_depth, a grid can waste many trials repeating the same few learning_rate values. Random search can cover more distinct learning_rate values with the same budget.
Bayesian optimization
Bayesian optimization starts with exploration, then concentrates trials where the results look promising:
max_depth
^
| x x
|
| x x
| x * x
| x x
| x
+------------------------> learning_rate
The * represents a promising region. Bayesian optimization does not know the best point in advance. It builds a probabilistic model from completed trials, estimates where good results may be, and chooses the next trial by balancing:
- Exploration: try uncertain regions because they might contain better results.
- Exploitation: try near known strong regions because they are likely to perform well.
That learning loop is why Bayesian optimization is useful for expensive black-box optimization: model training runs, trading backtests, simulations, batch jobs, and other workloads where every trial costs time or money.
How each method chooses the next trial
| Question | Grid search | Random search | Bayesian optimization |
|---|---|---|---|
| Does it use previous results? | No | No | Yes |
| Can it stop early with useful information? | Sometimes, but inefficiently | Yes, as a baseline | Yes, often with better direction |
| Does it handle continuous ranges naturally? | Only after discretizing | Yes | Yes |
| Is it easy to parallelize? | Yes | Yes | Yes, with batch selection |
| Is it sample-efficient? | Low | Medium | High |
When to use grid search
Use grid search when the search space is small enough that exhaustive testing is practical.
Good fit:
- You have two or three parameters with a handful of values each.
- You need a deterministic baseline.
- You want to validate that your objective metric and trial runner work.
- You are tuning categorical choices with few combinations.
Avoid grid search when the number of parameters grows, when ranges are continuous, or when each trial is expensive. In those cases, the grid becomes a trial budget problem instead of an optimization strategy.
When to use random search
Use random search when you want a simple, strong baseline across a large space.
Good fit:
- You do not yet know which parameters matter.
- Trials are cheap enough to run many samples.
- You want broad coverage before using a smarter optimizer.
- Your parameter space includes continuous ranges.
Random search is especially useful as a sanity check. If a more complex optimizer cannot beat a well-run random search baseline, the issue may be the objective, the search space, the trial budget, or noise in the measurements.
When to use Bayesian optimization
Use Bayesian optimization when you want better results from a limited trial budget.
Good fit:
- Each trial is slow, expensive, or operationally limited.
- You are optimizing a black-box function: you can run a trial and observe a metric, but you do not have gradients.
- You care about finding strong configurations with fewer wasted runs.
- You want the optimizer to adapt as results come in.
This is the typical case for hyperparameter optimization as a service. With HyperOptimizer, your workload runs in a Docker container for each trial. You define the search space and objective metric, the platform chooses parameter sets with optimization algorithms such as Bayesian optimization, collects metrics from stdout, and shows ranked results in the dashboard.
Practical example: tuning a trading strategy
Suppose a trading strategy has these parameters:
| Parameter | Range |
|---|---|
lookback_window | 10 to 200 |
atr_multiplier | 0.5 to 5.0 |
stop_loss_bps | 5 to 50 |
max_position_size | 0.01 to 0.10 |
A grid with 20 values per parameter would require 160,000 backtests. If one backtest takes two minutes, that is more than 222 days of serial compute.
Random search can sample 200 or 1,000 combinations without enumerating everything. Bayesian optimization can go further by learning from early backtests and spending later trials on more promising regions, such as lower drawdown with acceptable return or a better Sharpe ratio under realistic cost assumptions.
Choosing the right method
Use this rule of thumb:
| Your situation | Recommended method |
|---|---|
| Tiny, discrete search space | Grid search |
| Cheap trials and unknown parameter importance | Random search |
| Expensive trials and limited budget | Bayesian optimization |
| Need a baseline before a smarter optimizer | Random search |
| Need every listed combination tested exactly | Grid search |
| Need adaptive, sample-efficient search | Bayesian optimization |
Common mistakes
Mistake 1: making the grid too fine. A fine grid feels precise, but it can spend thousands of trials on values that do not matter.
Mistake 2: comparing methods with different budgets. If grid search gets 10,000 trials and Bayesian optimization gets 100, the comparison is not measuring the algorithm fairly.
Mistake 3: optimizing the wrong metric. Better search cannot fix an objective that rewards overfitting, ignores costs, or fails to penalize risk.
Mistake 4: skipping holdout validation. The best parameter set on one dataset may not generalize. Always validate strong configurations on data or scenarios not used for selection.
FAQ
Is Bayesian optimization always better than random search?
No. Bayesian optimization is usually more sample-efficient when trials are expensive, but random search is simpler and can be very competitive when trials are cheap, noisy, or highly parallel.
Is grid search obsolete?
No. Grid search is still useful for small search spaces, deterministic baselines, and testing that an optimization workflow is wired correctly.
Why does random search often beat grid search?
Random search can test more distinct values of important parameters. Grid search repeats fixed values across every dimension, which can waste trials when only a few parameters strongly affect the objective.
What is the best method for hyperparameter optimization?
For most expensive black-box workloads, Bayesian optimization is the best starting point after a random-search baseline. For tiny spaces, grid search is often enough.
Can these methods run in parallel?
Yes. Grid search and random search are naturally parallel. Bayesian optimization can also run parallel trials by selecting batches of promising and informative parameter sets.
Next steps
If your workload can accept parameters from the command line and print objective metrics, it can usually be optimized. Start with a small random-search or grid-search baseline, confirm your metric behaves correctly, then move to Bayesian optimization when trial budget matters.
HyperOptimizer is built for that workflow: package your model, backtest, simulation, or data pipeline as a Docker container, define the search space, and let managed optimization run the trials. Read the getting started guide or join the beta when you are ready to optimize without managing the infrastructure yourself.