diff --git a/docs/API/Models/OffPolicyMonteCarloControl.md b/docs/API/Models/OffPolicyMonteCarloControl.md new file mode 100644 index 0000000..287ca00 --- /dev/null +++ b/docs/API/Models/OffPolicyMonteCarloControl.md @@ -0,0 +1,61 @@ +# [API Reference](../../API.md) - [Models](../Models.md) - OffPolicyMonteCarloControl + +OffPolicyMonteCarloControl is a neural network with reinforcement learning capabilities. It can predict any positive numbers of discrete values. + +## Constructors + +### new() + +Create new model object. If any of the arguments are nil, default argument values for that argument will be used. + +``` +OffPolicyMonteCarloControl.new({targetPolicyFunction: string, discountFactor: number}): ModelObject +``` + +#### Parameters: + +* targetPolicyFunction: A function that defines the target policy used to select actions. The policy should be based on the current Q-values (or state-action values). This function determines how the agent chooses actions based on its current knowledge. Available options include: + + * Greedy: Selects the action with the highest Q-value for a given state. This is typically the optimal policy, assuming the Q-values are accurate. + + * Softmax: Selects actions probabilistically, where actions with higher Q-values are more likely to be chosen. The probability of selecting an action is determined by a temperature parameter that controls the exploration-exploitation trade-off. + + * StableSoftmax: The more stable option of Softmax (Default) + +* discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. + +#### Returns: + +* ModelObject: The generated model object. + +## Functions + +### setParameters() + +Set model's parameters. When any of the arguments are nil, previous argument values for that argument will be used. + +``` +OffPolicyMonteCarloControl:setParameters({targetPolicyFunction: string, discountFactor: number}) +``` + +#### Parameters: + +* targetPolicyFunction: A function that defines the target policy used to select actions. The policy should be based on the current Q-values (or state-action values). This function determines how the agent chooses actions based on its current knowledge. Available options include: + + * Greedy: Selects the action with the highest Q-value for a given state. This is typically the optimal policy, assuming the Q-values are accurate. + + * Softmax: Selects actions probabilistically, where actions with higher Q-values are more likely to be chosen. The probability of selecting an action is determined by a temperature parameter that controls the exploration-exploitation trade-off. + + * StableSoftmax: The more stable option of Softmax (Default) + +* discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. + +## Inherited From + +* [ReinforcementLearningBaseModel](ReinforcementLearningBaseModel.md) + +## References + +* [Off-Policy Monte Carlo Control, Page 90](http://incompleteideas.net/book/bookdraft2017nov5.pdf) + +* [Forgetting Early Estimates in Monte Carlo Control Methods](https://ev.fe.uni-lj.si/3-2015/Vodopivec.pdf) diff --git a/docs/VersionHistory/BetaVersionHistory.md b/docs/VersionHistory/BetaVersionHistory.md index bba4744..e796d99 100644 --- a/docs/VersionHistory/BetaVersionHistory.md +++ b/docs/VersionHistory/BetaVersionHistory.md @@ -1,8 +1,24 @@ # Beta Version +## Version 1.7 + +[1.7.0](Beta/1-7-0.md) -- 8/1/2025 + +## Version 1.6 + +[1.6.0](Beta/1-6-0.md) + +## Version 1.5 + +[1.5.0](Beta/1-5-0.md) + +## Version 1.4 + +[1.4.0](Beta/1-4-0.md) + ## Version 1.3 -[1.3.0](Beta/1-3-0.md) -- 24/12/2024 +[1.3.0](Beta/1-3-0.md) ## Version 1.2 diff --git a/docs/VersionHistory/ReleaseVersionHistory.md b/docs/VersionHistory/ReleaseVersionHistory.md index c549b70..695e83d 100644 --- a/docs/VersionHistory/ReleaseVersionHistory.md +++ b/docs/VersionHistory/ReleaseVersionHistory.md @@ -2,6 +2,7 @@ | Version | Number Of Blocks | Number Of Models | Number Of Optimizers | Number Of Cost Functions | Number Of Containers | Number of Utilities | Number Of Regularizers | Backward Incompatible Changes | |-----------------------|------------------|------------------|----------------------|--------------------------|----------------------|---------------------|------------------------|-------------------------------| +| [1.8](Release/1-8.md) | 70 | 22 | 8 | 5 | 2 | 3 | 3 | No | | [1.7](Release/1-7.md) | 70 | 20 | 8 | 5 | 2 | 3 | 3 | No | | [1.6](Release/1-6.md) | 65 | 20 | 8 | 5 | 2 | 3 | 3 | No | | [1.5](Release/1-5.md) | 61 | 20 | 8 | 5 | 2 | 3 | 3 | No |