Neural Network: Part 2

Neural Network: Logistic Regression

In this post, I'll deal with logic regression and its implementation of a single neuron network. I hope you have read my previous post on mattresses and its operation using NumPy package. You can click here to go to my previous post 

Logistic regression basically computes the probability of the output to be one. For example, if we train our Logistic model to recognize the image of a dog then for any new image the model basically try to calculate the probability of whether the new image is a dog or not. the higher the probability will imply that the given image is of a dog. 

A neuron is the primary and fundamental unit of computation for any neural network. A neuron will receive a vector that will include the input features. Each of the features will get multiplied with their corresponding weights and then a bias will be added to each of the features after which the weighted sum will be calculated. This will produce a linear output from the previous step. After processing the linear output, data will have to pass through the activation function. 

Activation Function

An activation function is a non-linear transformation which decides the probability of the linear input. Its output resides from 0 to 1 or from -1 to +1. This function helps in the mapping of values between a certain range which for us is between 0 and 1. For this post, I will use the Sigmoid function as the activation function. The Sigmoid function can be represented by the following equation
Sigmoid Function

The sigmoid function produces an S-shaped spanning across left and right plane. As per Wikipedia Sigmoid function is a bounded differentiable and a real function having a non-negative derivative at each. 
Sigmoid Graph
Source: Wikipedia

Weights and Biases

Weights and biases are responsible to adjust the input function which consists of features in order to bring the final output close to the actual output. Weights are multiplied with the features and then the bias component is added. The output has been passed to the activation function. Recently added from StackOverflow an interesting thing about the bias. As the name suggests bias means to favour something irrespective of whether that thing is right or wrong. Similarly, the bias favors the actual output by adding itself to the weighted sum.

The output z will now act as an input to our sigmoid function. One thing that users must always remember is that weight is a column vector whose size is equal to the number of features in the input and bias is a scalar number. The sigmoid function will now map the value of Z between 0 and 1. The output from this sigmoid function will indicate the probability of how close the input is to the actual output. 

Clearly, the above example was for a single input sample. What if we have multiple input samples? Well, the Logistic regression also works on multiple training samples. Suppose there are m training samples and each of the samples has n features then the input can be stated as M x N matrix. Let us call this matrix as X. Now we can easily perform dot product between X and the weight vector to get a linear output. The vector output will be then passed to the activation function which will result in an output vector A containing the probabilities of each training sample.

In the next post, I'll deal with the Loss function, Cost function and the implmentation of SNN (Single Neuron Network).

Neural Network: Part 1

Basics of Neurons

Hey everyone, long time no see. I have been working on some stuff so was busy and away from my blog. Well, as for my interest in AI, I have been working on neural networks and its implementation which I'll discuss in this post.

In this post, you will learn:
  • The basic concepts used in building a neural network
  • Implementing logistic regression
  • Single Neural Network
  • Forward propagation and Backward propagation
  • Building a small neural network using TensorFlow
  • Building a deep neural network using TensorFlow.
As we know, we humans learn from our experiences and through training. Our brain runs a very high speed and processes any activity quickly. Well, it actually depends on the type of activity, the person is working on. But as we all know, humans have a high grasping power and easily learn, understand, classify, remember and do various other things which machines take a hell lot of time to compute. All these computations are processed by our mighty brain. So to understand the technical neural network, we must understand the biological neural network.

Source: Wikipedia

The brain consists of a huge number of neurons. These neurons are the fundamental unit of the brain. They receive electric signals as input from the external world (Hello World!!) which are processed and sent to other neurons. A basic neuron consists of axion, dendrites, action potential, synapse etc.

Axons are the thin structure and are called as the transmitting part of the neurons. Dendrite is the receiving part of the neuron which receives its input from synapse summing total inputs. The action potential helps the neurons to communicate with each other. So basically, neurons receive electrical impulses from dendrites, which informs the neuron about the outcome of an incident. If the outcome is not in favor, these impulses are altered by chemical and electrical reactions and sent to other neurons by neurotransmission. This keeps on happening until the brain achieves perfection. 

In a similar way, we will work on an Artificial neuron.
Matrices and its manipulation is a basic requirement for this post. To simplify, we will work on the dot product, vectors, and broadcasting in python.

Dot Product
  If A is a C × D matrix and B is a D × E matrix, then the dot product of A and B is a C × E matrix. The dot product will reduce our computation time whenever we have quite a lot of equations. Note that the number of columns in the first matrix should always be equal to the number of rows in the second matrix. In python, we will use dot() function from the NumPy package.

A typical example describing the dot() function in the NumPy package

   import numpy as np 
   a = np.array([[7,6],[2,5]])
   b = np.array([[10,6,70],[18,9, 11]]) 
   c =,b)
   print("Shape of A ", a.shape)
   print("Shape of B ", b.shape)
   print("Shape of C ", c.shape)
   print("The dot product of A and B is : ", c)

And the result is 

   Broadcasting in Python helps to perform element-wise calculation between a matrix and vector or scalar. Scalar is the just a single number whereas a vector is rank 1 array. The advantage of NumPy package is that it provides all these operations on matrices, vectors, reshaping, transform etc.A vector can be represented as numpy.ndarray(n-Dimensional Array) object using NumPy's array() function. A column vector is of shape (A × 1) and a row vector is of shape (1 × B)

To shape an array we use the syntax Array_name.reshape(row, column)
To find the rank of the matrix we will use the function numpy.linalg.matrix_rank()

These were the basics required to start a simple program or project on neural networks. In the next post, I'll deal with Logic Regression with a single neuron which requires the basics as I have described above. 

Update: Click here to jump to my next post

SARSA Learning with Python

I worked on SARSA algorithm as well as on Q Learning algorithm and both of them had different Q matrix (Duh!) The methodology of both of the algorithms depicts how well one algorithm responds to future awards (which we can say OFF Policy for Q learning) while the other works of the current policy and takes an action before updating Q matrix (ON Policy).

The previous post example of the grid game showed different results when I implemented SARSA. It also involved some repetitive paths whereas Q didn't show any. A single step showed that SARSA followed the agent path and Q followed an optimal agent path.

To implement both ways I remember the way of pseudo code.


initiate Q matrix.
Loop (Episodes):
   Choose an initial state (s)
   while (goal):
   Choose an action (a) with the maximum Q value
   Determine the next State (s')
   Find total reward -> Immediate Reward + Discounted Reward (Max(Q[s'][a]))
   Update Q matrix
   s <- s'
new episode


initiate Q matrix
Loop (Episodes):
   choose an initial state (s)
   while (goal):
   Take an action (a) and get next state (s')
   Get a' from s'
   Total Reward -> Immediate reward + Gamma * next Q value - current Q value
   Update Q
   s <- s' a <- a'

Here are the outputs from Q-L and SARSA-L

The above is Q-L

This one is SARSA 

There is a difference between both Q Matrix. I worked on another example by using both Q learning and SARSA. It might appear similar to mouse cliff problem for some readers so bear with me.

The code for Naruto-Q-Learning is below

Here is Hinata trying to find her way to her goal by using SARSA

The code for Hinata SARSA Learning

I used epsilon-greedy method for action prediction. I generated a random floating number between 0 to 1 and set epsilon as 0.2. If the generated number is greater than 0.2 then I select maximum Q valued action (argmax). If the generated number is less than 0.2 then I select the action (permitted)  randomly. With each episode passing by, I decreased the value of epsilon (Epsilon Decay) This will ensure that as the agent learns its way it follows the path rather than continuing exploration. Exploration is maximum at the start of the simulation and gradually decreases as each episode are passed.

This is the decay of the epsilon.

The path followed in the above simulation is 0 - 4 - 8 - 9 - 10 - 11 - 7. Sometimes the agent also follows the same path as followed during Q learning. Well, I am continuing my exploration for the same and will post more details as I learn more about RL.

Till then, bye

Q-Learning with Python

Currently, I am working on learning algorithms in Data Science for robotics. Reading many examples online and trying them on my own gives me a feeling of reward. I got deeply fascinated by Q learning algorithm based on the Bellmans equation. I also made a Pong game using Q learning. You can view that project on my instructable.

It didn' take much time to understand the working of Q learning. It appeared similar to the State Space matrix that I studied in my Control Systems class in college which I have forgotten now. However, seeing a practical application makes it easier to learn.

Q-Learning is based on State-Action-Reward strategy. For example, every state has various actions that can be implemented in that state and we have to choose the action which returns maximum rewards for us.

The agent will roam around like a maniac at the start and learn about its actions and rewards. The next time when the agent faces the similar state, it will know what to do in order to minimize the loss and maximize the reward.

The basic equation of Q learning algorithm is

Q(s,a) = Q(s,a) + X*(R(s,a) +  Y * (Max(s',a)) - Q(s,a))

This algorithm follows Off Policy algorithm. The Q value is revised using the next state S' and next action A' based on next state. It basically increases its probability or Q value by adding a discounted reward from the next state that is yet to happen. I guess this is the reason why many other fellows also call it sometimes "greedy".

Here Q is the Q Matrix or better say the brain of our agent. R is the reward matrix which stores the reward for every step taken i.e. reward returned from a taken action in the particular state.

s' is the next state after the action is taken.
X is the learning rate. Closer to 1 means doing no mistakes at all which is superficial. Closer to 0 means no learning at all which we don't want at all.
Y is the discount factor. It tells the agent how much far it has to look. It can be understood with an example. The more importance you give to future rewards, the more will be the discount factor. The more you value near rewards, the less is the discount factor.

0 <= X,Y <= 1

I began with a 4 X 4 matrix like this

The green square is the goal with reward 100. The red square is danger and the agent has to avoid or else -10 reward rest all squares can be used for locomotion with a reward of -1.

I assigned each square with a state number. The actions of the agent will be 0, 1, 2, 3 where 0 is for UP, 1 is for DOWN, 2 is for LEFT, 3 is for RIGHT

This is my reward matrix
reward = np.array([[0, -10, 0, -1],
                   [0, -1, -10, -1],
                   [0, -1, -1, 10],
                   [0, -10, -1, -1],

                   [-10, -1, 0, -1],
                   [-1, -1, -10, -1],
                   [1, -10, -1, -10],
                   [100, -1, -1, 0],

                   [-10, -1, 0, -1],
                   [-1, -1, -1, -10],
                   [-1, -1, -1, -1],
                   [-10, -1, -10, 0],

                   [-1, 0, 0, -1],
                   [-1, 0, -1, -1],
                   [-10, 0, -1, -1],
                   [-1, 0, -1, 0]])

Each row is the state and each column is the action taken in that state. The value indicated is the reward for a particular action in a particular state. Here 0 means invalid reward i.e. that action si not valid. We cannot go UP or LEFT from state 0. Thus for state 0 reward matrix will be [0, -10, 0, -1]. The blueprint is [UP, DOWN, LEFT, RIGHT] so if we jump UP from state 0 it is not possible thus reward is 0. If we jump DOWN we land to red square thus reward is -10. If we move RIGHT we get -1 as our reward. Thus the reward matrix has a total of 16 states where each state has 4 actions.

The next state matrix is this

n_s = np.array([[-1,4,-1,1],




We have 16 states with 4 actions and each value represents the next state when an action is taken. Here -1 represents invalid state i.e. not possible.

The Action matrix is like this
action = np.array([[0,3],
                  [1, 2, 3],
                  [1, 2, 3],  "State 2 can have DOWN, LEFT, RIGHT"
                  [1, 2],



                   [0,2,3],  "State 13 can have DOWN, LEFT, RIGHT"

Q matrix is initialized as 16 X 4 matrix (16 states and 4 actions)
Q matrix stores the experience rewards. For example, Q[12][3] will represent the experience of when the agent was in state 12 and his action is 3 (LEFT). I understand this concept with probability. Greater is the value, greater is the probability of higher reward.

Here i_state represents the initial state of the agent which is 11.
i_s = np.array([[1,2,5,6,8,9,11,12,13,14,15]])
This is an array that stores the states from where agent can begin his training.

I began with 10 episodes as it was enough for my agent to learn. The loop begins with choosing a random state from i_s matrix at line 77.  Our goal is state 3 thus we run the agent until he reaches his goal. Line 81 describes the for loop which finds the maximum Q value (probability) among all possible actions in that state. Line 85: We find the next state based on the action taken. Line 89: We find maximum Q value of all possible actions of the next state. Line 93: We use the Q-Algorithm equation to find the value of that particular Q matrix location. Line94: All movements of the agent is recorded inside this matrix. Line 95: The next state becomes the current state. When the goal is met a new episode is started.

Let the starting position of the agent be 11. Thus now we apply the policy in this state. The possible actions in state 11 are TOP, LEFT, DOWN.
 Let Q[11] be [1, 3, 2, 5]
for i in action[i_state]:
if Q[i_state][i]>Qx:
act = i
Qx = Q[i_state][i]
n_state = n_s[i_state][act]
action[11] is [0, 1, 3]
Thus if Q[11][0] > Qx(-999) then act will be 0
Qx will be updated. When this loops runs act will be 3 because the maximum value among Q[11][0] = 1, Q[11][1] = 3, Q[11][3] = 5 is 5 which has index 3 thus it means the action that will reward maximum Q value in state 11 is 3.
 With this action value we can get out next state and that will be 10 i.e. the agent moves LEFT.

for i in action[n_state]:
Max = max(nxt_values)

Now we have the next state as 10. Now we will calculate the maximum Q value is that state for all possible actions and store it in Max.
 Multiplying this value (Max) with discount factor gives the future reward which we add with the immediate reward. The immediate reward is calculated with R matrix. The sum of immediate reward and the discounted reward is multiplied by the learning rate and added with Q value at that state: action location. 

Each episode has a total reward as calculated at Line 92.
The xd matrix is [11, 15, 14, 13, 9, 5, 6, 2, 3]. This is the optimal pathway of the agent to reach the goal.

So Long

EDIT: There was an error in the action matrix. It has been corrected