Self Balancing Robot using Machine Learning

Hola Amigos,

I have always loved inverted pendulums. They are very fascinating to me and I play with them a lot. In real life, I have made various ones using Arduino, motors and MPU 6050. Yes, I hate encoders because it involves too much wiring and I want a clean bot not an Android-like Dr. Zero from Dragon Ball Z.

In this project, I modified the Open AI, CartPole-v1 to act more like a self-balancing robot rather than an inverted pendulum.

So here is my bot.


The output plot of the error angle.


Looks nice, eh?

In order to balance the pendulum, we would need some parameters. The first is the angle of tilt. With just the angle of tilt, we can balance the pendulum, but the problem of runaway will exist. For example, If the pendulum is tilted by 5 degrees, it moves in that direction. There’s a balancing force where the gravitational force and the pseudo force acting opposite to the pendulum gets balanced. Due to this, the error angle remains constant. But because of this error, the motors will act to minimize the error, though they won’t be able to as the forces are equal. Thus, we get a constant velocity and the pendulum keeps on running away and never remains steady. Now because of the nature of inverted pendulum, the velocity won’t remain constant. Noise will always be there which increases the error because of the state of error. The motors do not react to such small errors; thus, the velocity remains constant. But the gravitational forces keep on pulling the pendulum. Thus, incapable of accelerating more as the motors have a limit, the pendulum becomes unable to cope with the gravitational force and thus falls to either side.


Inverted Pendulum Parameters:

The simulation of the inverted pendulum returns 4 variables.

1. Observation: The observation is a list of another 4 variables
  • · Pendulum Wheel Position
  • · Pendulum Velocity
  • · Pendulum Angle
  • · Pendulum Velocity at Tip
2. Reward: We get a reward of +1 if
  • · If the pendulum remains within -20 degrees to 20 degrees, we get a reward of +1 for every step.
  • · If the pendulum remains within -2.4 to +2.4 to lateral displacement.

3. Done: If the above-mentioned limits are exceeded, we get a Boolean value as True else it remains at False.

4. Info: Other output which we won’t use right now.

So, we would need the following parameters to train our model.
  • a. Wheel velocity 
  • b. Wheel Angle 
  • c. Displacement 
  • d. Tip Velocity

Neural Model:

Our model has a deep neural network that takes 4 input neurons with the parameters mentioned above. First hidden layer with 8 neurons. Second hidden layer with 4 neurons and the final layer with 1 node.



Activation Function:

For the input layer, hidden layer 1, hidden layer 2, the activation layer is ReLU. For the output node, we went ahead with Softmax function.
ReLU for the hidden layers and Softmax for the outer layer

Weights:


The initial weights were uniformly distributed between 0 and 1 using the He – Initialization method. This keeps the weights away from vanishing. The weights for the input layer and hidden layers were updated using The genetic algorithm which we will discuss.

Genetic Model:

The Genetic Algorithm is based on the evolutionary mechanism of nature. In simple words, a random population tries to survive a given environment. All the members of the population try to randomly survive the environment without knowing the consequences. Each member gets a fitness score according to the environment. A famous term called “Survival of the fittest” can be applied here. The fittest individual is the one who gets the maximum score from their environment. Then as survival continues, individuals mate with each other and their offspring now have their genes. Now, if two fittest individuals mate, the child might have genes from both parents in a random ratio. Many children are produced by mating the fittest individuals as they were only able to survive. 

Now, there are many disasters, natural influences, diseases, etc. that affect the children in the natural environment. This is called mutation which may affect children produced. Now, these children try to survive the environment and get a score and the same process goes on.

Crossover: When two parents’ mate, the children can have genes from either one or both. 

Types of crossover:

1. One Point Cross-Over: Suppose the genes of parent_1 is as below

Parent 1 Gene
1
2
3
4
5
6
7
8
9
0


Parent 2 Gene
0
2
1
3
5
4
6
8
7
9

We choose a random index ex. 5 as shown above.

The cross over (mating) result would results in two children as shown below

1
2
3
4
5
4
6
8
7
9

0
2
1
3
5
6
7
8
9
0

2.       Two Point Cross-Over: In this, we choose two index points. Let us take index 3, 5

Parent 1 Gene

1
0
0
0
1
0
1
0
1
1



Parent 2 Gene
0
1
0
0
0
1
1
1
1
0

                    
Child:      
1
0
0
0
0
1
1
0
1
1

0
1
0
0
1
0
1
1
1
0


Whole Arithmetic Crossover:

In the whole arithmetic crossover, we apply the following formula to determine child 
Thus, the child will be 

Parent 1 Gene


1
2
3
4
5
6
7
8
9
0

Parent 2 Gene

0
2
1
3
5
4
6
8
7
9


For k = 0.75

Child Gene
0.75
2
2.5
3.75
5
5.5
6.75
8
8.5
2.25

Uniform Crossover: In the uniform cross over, genes are exchanged uniformly

Parent 1 Gene


1
0
0
0
1
0
1
0
1
1

Parent 2 Gene

0
1
0
0
0
1
1
1
1
0

                    Child

0
0
0
0
0
0
1
0
1
1


Mutation: It’s a random change within the DNA that occurs because of the surrounding influence or because DNA mimicking or it might also happen while mating. 

If a child has the following DNA:

0
0
0
0
0
0
1
0
1
1



After mutation it might become:

0
0
0
0
1
0
1
0
0
1




Implementation:

Initially, we generate random weights for the input layer and the hidden layers. 

i_w is the input layer weight. H_1 is the hidden layer 1 weight. H_2 is the weight of hidden layer 2. Each of them is being generated initially.

The chromosomes are the concatenation of all the weights into a single row. These weights are now passed to the neural network. The observation that we get from the pendulum environment is then fed to the inputs of the neural network. We either get 0 or 1 as output which is fed to the pendulum as its input to accelerate to the right or left. 

The initial population was set to 70. The initial generations were set to 100.

The simulation is run for 400 iterations and for each iteration, we get a reward if the pendulum is within limits. This becomes our fitness score. After running for 400 iterations for everyone within the population. Each individual’s chromosome gets a score and the among them new parents are to be chosen by using the following function.

loc_1 is the index of the maximum score within the score list. Parent_1 is the parent chromosome whose index we retrieved. Now, we could have chosen a parent with maximum fitness again, but this creates an issue. Knowing only the best iterations is unreal and never happens in nature too. Our aim is to let the genes know the bad iterations as well and then learn from it. Hence, we moved with the median score instead of the highest score. The function returns the two parents. 

Here we are following the “Survival of the fittest” scenario. 

Now that we have our parents, we need to generate new offsprings from them using the crossover. After mating and producing a new child, we need some mutation. Each child is mutated and then added to the new population. However, it depends on the mutation rate. It is not necessary that a child gets a mutation or not. The following is the code for mutation.

Heres the output results:


Also, we had some time between iterations which posed another problem and had some erratic results but worked out well.

Here is the time lag output. The denser areas have high lag and the lesser ones have smooth outcomes.

Implementation Environment


The implementation and testing of the neural and the genetic model were done on Python 3.7 IDLE, Colaboratory using a modified OpenAI Gym environment for rendering. The OpenAI Gym environment gives us various models to practice and model various problems with different algorithms. Collaboratory seemed to run a lot faster and gave us an output plot with 27 seconds with 70 generations, 90 population and an iteration of 400 (FPS). On the contrary, on the PC, we got an average runtime of 1 minute and 12 seconds to run the same configuration of generations, population, and iterations. GPU with Python 3 was used on Colaboratory. Our PC configuration has core i5 with 8 GB RAM and 4GB Intel HD Graphics

Critical Discussion

The aim of the pendulum is to reduce oscillation. The mean gets flattened slowly but keeps on increasing because we are collecting the observation from the start of the generation which also includes poor fitness scores. The median fitness increases as can be seen from the figures above and attains the best fitness score. The best fitness score is one in a population and our aim was to make the most of the population get to the fitness score. The median shows this approach. In the oscillation plot, we have plotted the angle error of the pendulum with successive iterations and it can be clearly seen with different implementations of crossovers. Having a mutation rate of 4% gave the best results. The best results can be seen from the for uniform crossover where each gene from the chromosome was exchanged between parents. So, we can say, the children have half gene from each parents. With uniform crossover, we saw least amount of oscillation for most of the executions. Having two-point crossover didn’t go right. Getting the right parent was tough and turned out to be getting more genetic information from either parent rathe than distributing it. For natural selection, we earlier went ahead with selecting best parents i.e. two parents with top two best scores. Later, we observed that the bot only balanced when completely straight and couldn’t recover from large tilts as shown below. Hence, we went ahead with the best parent and the median parent which has the characteristics of not only the best angle and velocity but also of highly tilted angles and high velocity.



The left white circle is least dense having the least lag and the right circle is the densest having high lag.

Here is the code for designing the above model. Feel free to ask for any doubt

The code 'cartpole.py' has to be present in your library of Python. Put this to C:\Users\User Name\AppData\Local\Programs\Python\Python37\Lib\site-packages\gym\envs to get the Self Balancing Bot instead of a pendulum.



*All Credit goes to Open AI Gym for Original Cartpole. The modification was done for educational purpose*

Self Balancing Robot using Machine Learning

Hola Amigos, I have always loved inverted pendulums. They are very fascinating to me and I play with them a lot. In real life, I have m...