|
|
@ -60,7 +60,7 @@ determined. We backtract over all the states and moves to update the Q-table,
|
|
|
|
given the appropriate reward for each player.
|
|
|
|
given the appropriate reward for each player.
|
|
|
|
Since the learning is episodic it can only be done at the end.
|
|
|
|
Since the learning is episodic it can only be done at the end.
|
|
|
|
|
|
|
|
|
|
|
|
The learning rate α is set to `1` because the game if fully
|
|
|
|
The learning rate α is set to `1` because the game is fully
|
|
|
|
deterministic.
|
|
|
|
deterministic.
|
|
|
|
|
|
|
|
|
|
|
|
We use an ε-greedy (expentionnally decreasing) strategy for
|
|
|
|
We use an ε-greedy (expentionnally decreasing) strategy for
|
|
|
|