✏️ typo
This commit is contained in:
parent
3d28e60ba0
commit
582e41687b
1 changed files with 1 additions and 1 deletions
|
@ -60,7 +60,7 @@ determined. We backtract over all the states and moves to update the Q-table,
|
|||
given the appropriate reward for each player.
|
||||
Since the learning is episodic it can only be done at the end.
|
||||
|
||||
The learning rate α is set to `1` because the game if fully
|
||||
The learning rate α is set to `1` because the game is fully
|
||||
deterministic.
|
||||
|
||||
We use an ε-greedy (expentionnally decreasing) strategy for
|
||||
|
|
Loading…
Reference in a new issue