✏️ typo

master
otthorn 3 years ago
parent 3d28e60ba0
commit 582e41687b

@ -60,7 +60,7 @@ determined. We backtract over all the states and moves to update the Q-table,
given the appropriate reward for each player. given the appropriate reward for each player.
Since the learning is episodic it can only be done at the end. Since the learning is episodic it can only be done at the end.
The learning rate α is set to `1` because the game if fully The learning rate α is set to `1` because the game is fully
deterministic. deterministic.
We use an ε-greedy (expentionnally decreasing) strategy for We use an ε-greedy (expentionnally decreasing) strategy for

Loading…
Cancel
Save