diff --git a/README.md b/README.md index 227aa3a..2ada376 100644 --- a/README.md +++ b/README.md @@ -35,19 +35,19 @@ considered draw if no one won. ## Combinatorics Without taking into account anything, we can estimate the upper bound of the -number of possible boards. There is $ 3^9 = 19683 $ possibilites. +number of possible boards. There is `3**9 = 19683` possibilites. There are 8 different symetries possibles (dihedral group of order 8, aka the symetry group of the square). This drastically reduce the number of possible boards. Taking into account the symetries and the impossible boards (more O than X for -example), we get $765$ boards. +example), we get `765` boards. Since we do not need to store the last board in the DAG, this number drops to -$627$ non-ending boards. +`627` non-ending boards. -This make our state space size to be $627$ and our action space size to be $9$. +This make our state space size to be `627` and our action space size to be `9`. ## Reward @@ -60,10 +60,10 @@ determined. We backtract over all the states and moves to update the Q-table, given the appropriate reward for each player. Since the learning is episodic it can only be done at the end. -The learning rate $\alpha$ is set to $1$ because the game if fully +The learning rate α is set to `1` because the game if fully deterministic. -We use an $\varepsilon$-greedy (expentionnally decreasing) strategy for +We use an ε-greedy (expentionnally decreasing) strategy for exploration/exploitation. The Bellman equation is simplified to the bare minimum for the special case of