Gym: MountainCar using Genetic Algorithm
Solving OpenAI Gym MountainCar using Genetic Algorithm
This time I have partially solved the Gym's MountainCar v0 problem using the Deep Genetic Algorithm. Why partially?
The model trained within 48 seconds which is astonishing for me as well. It seems like DGA rocks. Although my neural network wasn't that deep to be specific. I just used 8 nodes in the first hidden layer and 4 in the next hidden layer.
The observation space:
The action space:
Here's the GIF of the Mountain Car:
In my previous post, I solved the LunarLander environment using a genetic algorithm. However, in the LunarLander environment, we have high complexity, space, and states.
The result of our algorithm?
The Green colored plot is the median of our population. The Blue colored plot is the maximum of each generation.
The tweaks in GA relies on the basics.
For a crossover, I used Uniform Crossover. It gave better results than One Point Crossover and the most awful was the two-point crossover. The initial population chosen was 80. Work Well. Earlier, I was selecting the best 2 parents and mating them and that axed the evolution. Within 10 generations, we were done.
So, among the population, many parents were involved as well. After all, parents don't die after giving birth. So how many parents? Random!. I selected a random number between 10 and 70 and selected that amount of top-scoring parents from the population pool. Suppose we get 40 as the random number. So the population will have 40 maximum scoring individuals from the current population and the rest 40 will be formed from the mating of the selected ones. In total, we now have 80 again for our population.
For mating, I selected two random parents from the selected pool. Mind it that both parents must be different and cannot be the same. The same code was used as that of the lunar lander.
Yes, this is tough. Since my model was constantly running out of diversity, I had increased the mutation rate to keep diversity within the population. Having a low mutation rate kept the same median score for about 10-15 generations which were completely unacceptable. The mutation rate was kept the same as that of the lunar lander.
Deep Neural Network:
The actions were chosen by the neural network model. The input had 2 nodes. I went for 2 hidden layers with 8 and 4 neurons respectively. Increasing the neurons shows a greater slope in the increase of overall score but it also increased the computation. The output is a 3 node layer with Softmax activation to determine the probability of each action. The hidden layers have ReLU as the activation function. The weights were not normalized and were predicted by GA. Initially, weights were uniformly distributed.