✏️ typo
This commit is contained in:
parent
3d28e60ba0
commit
582e41687b
1 changed files with 1 additions and 1 deletions
|
@ -60,7 +60,7 @@ determined. We backtract over all the states and moves to update the Q-table,
|
||||||
given the appropriate reward for each player.
|
given the appropriate reward for each player.
|
||||||
Since the learning is episodic it can only be done at the end.
|
Since the learning is episodic it can only be done at the end.
|
||||||
|
|
||||||
The learning rate α is set to `1` because the game if fully
|
The learning rate α is set to `1` because the game is fully
|
||||||
deterministic.
|
deterministic.
|
||||||
|
|
||||||
We use an ε-greedy (expentionnally decreasing) strategy for
|
We use an ε-greedy (expentionnally decreasing) strategy for
|
||||||
|
|
Loading…
Reference in a new issue