diff --git a/README.md b/README.md index 2ada376..f8766d3 100644 --- a/README.md +++ b/README.md @@ -60,7 +60,7 @@ determined. We backtract over all the states and moves to update the Q-table, given the appropriate reward for each player. Since the learning is episodic it can only be done at the end. -The learning rate α is set to `1` because the game if fully +The learning rate α is set to `1` because the game is fully deterministic. We use an ε-greedy (expentionnally decreasing) strategy for