-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
34085e4
commit 71e91c3
Showing
3 changed files
with
79 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# [API Reference](../../API.md) - [Models](../Models.md) - OffPolicyMonteCarloControl | ||
|
||
OffPolicyMonteCarloControl is a neural network with reinforcement learning capabilities. It can predict any positive numbers of discrete values. | ||
|
||
## Constructors | ||
|
||
### new() | ||
|
||
Create new model object. If any of the arguments are nil, default argument values for that argument will be used. | ||
|
||
``` | ||
OffPolicyMonteCarloControl.new({targetPolicyFunction: string, discountFactor: number}): ModelObject | ||
``` | ||
|
||
#### Parameters: | ||
|
||
* targetPolicyFunction: A function that defines the target policy used to select actions. The policy should be based on the current Q-values (or state-action values). This function determines how the agent chooses actions based on its current knowledge. Available options include: | ||
|
||
* Greedy: Selects the action with the highest Q-value for a given state. This is typically the optimal policy, assuming the Q-values are accurate. | ||
|
||
* Softmax: Selects actions probabilistically, where actions with higher Q-values are more likely to be chosen. The probability of selecting an action is determined by a temperature parameter that controls the exploration-exploitation trade-off. | ||
|
||
* StableSoftmax: The more stable option of Softmax (Default) | ||
|
||
* discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. | ||
|
||
#### Returns: | ||
|
||
* ModelObject: The generated model object. | ||
|
||
## Functions | ||
|
||
### setParameters() | ||
|
||
Set model's parameters. When any of the arguments are nil, previous argument values for that argument will be used. | ||
|
||
``` | ||
OffPolicyMonteCarloControl:setParameters({targetPolicyFunction: string, discountFactor: number}) | ||
``` | ||
|
||
#### Parameters: | ||
|
||
* targetPolicyFunction: A function that defines the target policy used to select actions. The policy should be based on the current Q-values (or state-action values). This function determines how the agent chooses actions based on its current knowledge. Available options include: | ||
|
||
* Greedy: Selects the action with the highest Q-value for a given state. This is typically the optimal policy, assuming the Q-values are accurate. | ||
|
||
* Softmax: Selects actions probabilistically, where actions with higher Q-values are more likely to be chosen. The probability of selecting an action is determined by a temperature parameter that controls the exploration-exploitation trade-off. | ||
|
||
* StableSoftmax: The more stable option of Softmax (Default) | ||
|
||
* discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1. | ||
|
||
## Inherited From | ||
|
||
* [ReinforcementLearningBaseModel](ReinforcementLearningBaseModel.md) | ||
|
||
## References | ||
|
||
* [Off-Policy Monte Carlo Control, Page 90](http://incompleteideas.net/book/bookdraft2017nov5.pdf) | ||
|
||
* [Forgetting Early Estimates in Monte Carlo Control Methods](https://ev.fe.uni-lj.si/3-2015/Vodopivec.pdf) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters