You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using k=5, n=100, MAML fails to learn: average training and validation returns consistently hover around 50 throughout all 500 outer loop steps. Any possible discrepancies between this repo's code/config and the paper's experiments?
The code did change significantly between the version we used in the paper and the current version (the paper was written on a very early version of the code, which probably got lost in the many refactoring we did even prior to open-sourcing the code). I haven't run bandit experiments since then on the new code unfortunately, I added the config files a few months ago after some request, but I haven't tried it myself. Unfortunately I don't know if the results should still hold with this version (I thought they would).
Using k=5, n=100, MAML fails to learn: average training and validation returns consistently hover around 50 throughout all 500 outer loop steps. Any possible discrepancies between this repo's code/config and the paper's experiments?
For reference, the following command
produces the following average training/validation average returns for first and last 5 iterations respectively:
The text was updated successfully, but these errors were encountered: