✏️ typo

This commit is contained in:
otthorn 2021-05-17 00:53:53 +02:00
parent 3d28e60ba0
commit 582e41687b

View file

@ -60,7 +60,7 @@ determined. We backtract over all the states and moves to update the Q-table,
given the appropriate reward for each player.
Since the learning is episodic it can only be done at the end.
The learning rate α is set to `1` because the game if fully
The learning rate α is set to `1` because the game is fully
deterministic.
We use an ε-greedy (expentionnally decreasing) strategy for