Gym: LunarLander using Genetic Algorithm

Solving OpenAI Gym Lunar Lander v2 using Genetic Algorithm 


Hey fellas,
This time I have partially solved the Gym'sLunarLander v2 problem using the Deep Genetic Algorithm. Why partially?

Well, the average score means between 180 to 200. The best score crosses 200 for a good number of times. Median score dips to negative 'sometimes'. But the landing of the lander is great and might even get claps from NASA. I am kidding.

UPDATE: Model has crossed 200 median scores!! Celebrations
UPDATE: I accidentally chose the SoftMax activation function every layer. I corrected that and now only the hidden layers have ReLU and the output layer has Softmax activation function.

Here's the GIF of multiple landings of our Lander.






In my previous post, I solved the CartPole environment using a genetic algorithm. However, in the Cartpole environment, we have low complexity, space, and states. Also, Catpole begins with a state of maximum reward i.e the rendering always starts from the perpendicular position which is what we want out the environment to do. So basically, our algorithm already begins with the best reward and hence, requires fewer iterations to find. In Lunar Lander, the best reward state has to be discovered by our algorithm. Most of the time the Lander starts from same position but rarely does it return to the landing pad which eventually returns the best rewards. With this project, I came to realize that for the genetic methodology, if you have a large space, you would require more exploration, more population and elitism won't work.

Why elitism won't work?

Bill Gates' son is an elite person. Bill Gates isn't. How?
Bill Gates or Steve Jobs have seen worst as well the best of their times. They have seen a failure of the product as well as the success of the company. They have in relative more experience. On the other hand, Bill Gates' son has seen money from the very beginning. He/She might not know the real value of hardship as none of them would have to toil like Jobs or Gates. So overall, we don't just need 'What to do' which is the policy of elites. We also need 'What not to do' which is the policy of non-elites. The experience of non-elites helps us to filter the elites' ones as well.

PS. Thomas Alva Edison was a damn nonelite person for sure as he discovered 10000 ways that don't work.

The result of our algorithm?





New Output with correct activation function (ReLU in the hidden layers and Softmax for the output layers ):



The Green colored plot is the median of our population. The Blue colored plot is the maximum of each generation.

The tweaks in GA relies on the basics.

Changes: Earlier I had mistakenly set Softmax for every layer. This time the input and the two hidden layers have the ReLU activation function and the output layer has the Softmax activation function. With this new model, I reached the median score of 200 with a mere 860 generations as compared to 1054 generations with an earlier version. Below is a comparison of dissolved graphs.


The Black plot is the plot of the old 'best' scores of each generation. The Red plot is of the new. It can be seen that ReLU had more trouble in maintaining consistency initially with the best scores but grasped with the higher scores fastly. The dark blue plot is of the new model and the cyan plot if of old model median scores. The new model converges fastly than the old version. Until 100 generations as in the plot, the scores were pretty much the same but after that, the slope bumps up.


Have a look at the best score a.k.a History in the above screenshot. We have a consistent above 300 scores. The average score is also above 200 i.e. 287.79

Here is the new model code:

Crossover:

For a crossover, I used Uniform Crossover. It gave better results that One Point Crossover and the most awful was the two-point crossover. The initial population chosen was 80. Work Well. Earlier, I was selecting best 2 parents and mating them and that axed the evolution. Even after 3000 generations, the score rarely changed. It remained constantly between -1500 to -1504. 

Natural Selection:

So, among the population, many parents were involved as well. After all, parents don't die after giving birth. So how many parents? Random!. I selected a random number between 10 and 70 and selected that amount of top-scoring parents from the population pool. Suppose we get 40 as the random number. So the population will have 40 maximum scoring individuals from the current population and the rest 40 will be formed from the mating of the selected ones. In total, we now have 80 again for our population. 

Mating:

For mating, I selected two random parents from the selected pool. Mind it that both parents must be different and cannot be the same. How can an individual mate with itself? It's just immoral. It took only 1054 generations to reach a score of 200.  

Mutation:

Yes, this is tough. Since my model was constantly running out of diversity, I had increased the mutation rate to keep diversity within the population. Having a low mutation rate kept the same median score for about 10-15 generations which were completely unacceptable. I increased it to 60%. and mutated with a random value between -300 to 300. 

Deep Neural Network:

The actions were chosen by the neural network model. The input had 8 nodes. I went for 2 hidden layers with 36 neurons each. Increasing the neurons shows a greater slope in the increase of overall score but it also increased the computation.  The output is a 4 node layer with Softmax activation to determine the probability of each action. The hidden layers have ReLU as the activation function. The weights were not normalized and were predicted by GA. Initially, weights were uniformly distributed.



Comments

Popular posts from this blog

SPI Working with Verilog Code

Verilog Code for I2C Protocol

SR Flip Flop Verilog Code