✏️ typo

master
otthorn 3 years ago
parent 3d28e60ba0
commit 582e41687b

@ -60,7 +60,7 @@ determined. We backtract over all the states and moves to update the Q-table,
given the appropriate reward for each player.
Since the learning is episodic it can only be done at the end.
The learning rate α is set to `1` because the game if fully
The learning rate α is set to `1` because the game is fully
deterministic.
We use an ε-greedy (expentionnally decreasing) strategy for

Loading…
Cancel
Save