Source code for the paper "Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning" - Otmane Sakhi, Imad Aouali, Pierre Alquier, Nicolas Chopin published at NeuRIPS 2024 (Spotlight).
A virtual Python environment can be created from the requirement file holding all the needed packages to reproduce the results of the paper.
virtualenv -p python3.9 pess_ls
source pess_ls/bin/activate
pip install -r requirements.txt
The OPE/OPS experiments are built on the experiments conducted in "Kuzborskij, I., Vernade, C., Gyorgy, A., & Szepesvári, C. (2021, March). Confident off-policy evaluation and selection through self-normalized importance weighting. In International Conference on Artificial Intelligence and Statistics (pp. 640-648). PMLR.".
The code for these experiments is heavily inspired from their associated Github package.
To run OPE experiments, please execute:
python ope_ops/policy_evaluation.py
To run OPS experiments, please execute:
python ope_ops/policy_selection.py
To run OPL experiments, please execute:
python opl/policy_learning.py
If you use this code, please cite our work (This will be updated once the proceedings are out)
@misc{sakhi2024logarithmicsmoothingpessimisticoffpolicy,
title={Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning},
author={Otmane Sakhi and Imad Aouali and Pierre Alquier and Nicolas Chopin},
year={2024},
eprint={2405.14335},
url={https://arxiv.org/abs/2405.14335}}