Skip to content

Commit

Permalink
Updates
Browse files Browse the repository at this point in the history
  • Loading branch information
AqwamCreates committed Jan 8, 2025
1 parent 34085e4 commit 71e91c3
Show file tree
Hide file tree
Showing 3 changed files with 79 additions and 1 deletion.
61 changes: 61 additions & 0 deletions docs/API/Models/OffPolicyMonteCarloControl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# [API Reference](../../API.md) - [Models](../Models.md) - OffPolicyMonteCarloControl

OffPolicyMonteCarloControl is a neural network with reinforcement learning capabilities. It can predict any positive numbers of discrete values.

## Constructors

### new()

Create new model object. If any of the arguments are nil, default argument values for that argument will be used.

```
OffPolicyMonteCarloControl.new({targetPolicyFunction: string, discountFactor: number}): ModelObject
```

#### Parameters:

* targetPolicyFunction: A function that defines the target policy used to select actions. The policy should be based on the current Q-values (or state-action values). This function determines how the agent chooses actions based on its current knowledge. Available options include:

* Greedy: Selects the action with the highest Q-value for a given state. This is typically the optimal policy, assuming the Q-values are accurate.

* Softmax: Selects actions probabilistically, where actions with higher Q-values are more likely to be chosen. The probability of selecting an action is determined by a temperature parameter that controls the exploration-exploitation trade-off.

* StableSoftmax: The more stable option of Softmax (Default)

* discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1.

#### Returns:

* ModelObject: The generated model object.

## Functions

### setParameters()

Set model's parameters. When any of the arguments are nil, previous argument values for that argument will be used.

```
OffPolicyMonteCarloControl:setParameters({targetPolicyFunction: string, discountFactor: number})
```

#### Parameters:

* targetPolicyFunction: A function that defines the target policy used to select actions. The policy should be based on the current Q-values (or state-action values). This function determines how the agent chooses actions based on its current knowledge. Available options include:

* Greedy: Selects the action with the highest Q-value for a given state. This is typically the optimal policy, assuming the Q-values are accurate.

* Softmax: Selects actions probabilistically, where actions with higher Q-values are more likely to be chosen. The probability of selecting an action is determined by a temperature parameter that controls the exploration-exploitation trade-off.

* StableSoftmax: The more stable option of Softmax (Default)

* discountFactor: The higher the value, the more likely it focuses on long-term outcomes. The value must be set between 0 and 1.

## Inherited From

* [ReinforcementLearningBaseModel](ReinforcementLearningBaseModel.md)

## References

* [Off-Policy Monte Carlo Control, Page 90](http://incompleteideas.net/book/bookdraft2017nov5.pdf)

* [Forgetting Early Estimates in Monte Carlo Control Methods](https://ev.fe.uni-lj.si/3-2015/Vodopivec.pdf)
18 changes: 17 additions & 1 deletion docs/VersionHistory/BetaVersionHistory.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,24 @@
# Beta Version

## Version 1.7

[1.7.0](Beta/1-7-0.md) -- 8/1/2025

## Version 1.6

[1.6.0](Beta/1-6-0.md)

## Version 1.5

[1.5.0](Beta/1-5-0.md)

## Version 1.4

[1.4.0](Beta/1-4-0.md)

## Version 1.3

[1.3.0](Beta/1-3-0.md) -- 24/12/2024
[1.3.0](Beta/1-3-0.md)

## Version 1.2

Expand Down
1 change: 1 addition & 0 deletions docs/VersionHistory/ReleaseVersionHistory.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

| Version | Number Of Blocks | Number Of Models | Number Of Optimizers | Number Of Cost Functions | Number Of Containers | Number of Utilities | Number Of Regularizers | Backward Incompatible Changes |
|-----------------------|------------------|------------------|----------------------|--------------------------|----------------------|---------------------|------------------------|-------------------------------|
| [1.8](Release/1-8.md) | 70 | 22 | 8 | 5 | 2 | 3 | 3 | No |
| [1.7](Release/1-7.md) | 70 | 20 | 8 | 5 | 2 | 3 | 3 | No |
| [1.6](Release/1-6.md) | 65 | 20 | 8 | 5 | 2 | 3 | 3 | No |
| [1.5](Release/1-5.md) | 61 | 20 | 8 | 5 | 2 | 3 | 3 | No |
Expand Down

0 comments on commit 71e91c3

Please sign in to comment.