ContinuumArmedBandits is a Python package for optimizing actions in a continuous domain using Bayesian optimization, tailored for scenarios like optimizing Google Ad spend in a marketing strategy. The approach generalizes the multi-armed bandit problem to continuous actions, aiming to maximize reward signals across various contexts.
- Introduction
- How It Works
- Installation
- Getting Started
- Usage
- Additional Materials
- Contributing
- License
ContinuumArmedBandits leverages Bayesian optimization and Gaussian processes to efficiently explore and exploit a continuous action space. This is particularly useful in applications where sampling the function to be optimized is expensive, such as in marketing strategies to optimize ad spend on critical keywords for Google Search.
Bayesian optimization constructs a posterior distribution of the target function using a Gaussian process. This distribution improves as more observations are collected, guiding the algorithm to explore promising areas while exploiting known good regions. This process is iterative and balances exploration and exploitation using strategies like Upper Confidence Bound (UCB) or Expected Improvement (EI).
To install the package, clone the repository and install the required dependencies:
git clone https://github.com/BrutishGuy/ContinuumArmedBandits.git
cd ContinuumArmedBandits
pip install -r requirements.txt
Here’s a quick example to get you started with optimizing a black-box function.
Define the function you wish to optimize. In a real scenario, the function's internals are unknown.
def black_box_function(x, y):
return -x ** 2 - (y - 1) ** 2 + 1
Instantiate the BayesianOptimization
object with the function and parameter bounds.
from bayes_opt import BayesianOptimization
# Bounded region of parameter space
pbounds = {'x': (2, 4), 'y': (-3, 3)}
optimizer = BayesianOptimization(
f=black_box_function,
pbounds=pbounds,
random_state=1,
)
Run the optimizer to find the optimal parameters.
optimizer.maximize(
init_points=2,
n_iter=3,
)
print(optimizer.max)
Explore the detailed examples and advanced usage in the provided notebooks and scripts in the repository:
This set of materials was last updated August 2021, so do additional research. There are always new and interesting pieces of work and research coming out which may be of interest to folks.
Below you can find links to various reading materials relating to bandits, the contextual variants, how to evaluate bandit algorithms, variations of bandits for continuous action domains, for ranking problems, etc.
-
Multi-Variate Web Optimization Using Linear Contextual Bandits
-
AutoML for Contextual Bandits by Google - Using oracles as environment simulators
You can below also find some interesting implementations of some of the theory presented here and generally speaking some useful bandit libraries on GitHub.
-
Gaussian Process Contextual Bandits (Assumes continuous contexts, unfortunately)
-
BanditLib - A Bandit library with many implementations of bandit algorithms
-
Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertisement
-
Contextual Bandits using Decision Trees to Reduce the Search Space
-
More Contextual Bandit algorithm implementations by David Cortes
-
Real-Time Bidding by Reinforcement Learning in Display Advertising
Please open issues for bugs or feature requests, and submit pull requests for review.
This project is licensed under the MIT License.