✏️ typo

This commit is contained in:
otthorn 2021-05-17 00:53:53 +02:00
parent 3d28e60ba0
commit 582e41687b

View file

@ -60,7 +60,7 @@ determined. We backtract over all the states and moves to update the Q-table,
given the appropriate reward for each player. given the appropriate reward for each player.
Since the learning is episodic it can only be done at the end. Since the learning is episodic it can only be done at the end.
The learning rate α is set to `1` because the game if fully The learning rate α is set to `1` because the game is fully
deterministic. deterministic.
We use an ε-greedy (expentionnally decreasing) strategy for We use an ε-greedy (expentionnally decreasing) strategy for