In my previous project, I dealt with self-balancing robot without PID controller in this version of embedded the PID control on the same boat and the results are a better than the previous version obviously I am using the control systems part where our main objective is to achieve a critically damped robot. It should reach the steady state as it should have fewer oscillations and it should also recover from light to medium push.

In my previous bot without PID, I had to face problems where the robot will run away with a certain angle. Let's said it x and the Motors will move forward to balance the boat, however, the force or better say pseudo force that acts on the bot will be equal to the force that is pulling the robot downloads at a certain time. When these two forces balance the Robot full neither fall down nor it attain its vertical position. In this Mod, the robot will keep moving forward and forward in forward and finally, it will fall down
Obviously controlling this robot without PID has drawbacks, therefore, I worked on PID and the results are pretty good now.

The construction is same as it was in the previous Bot so if you have made the boat using my previous post then, of course, this will also work. All you have to do is to change the code and will be as good as new.

One thing you must remember while tuning the PID is, first of all, you must set everything parameters to zero that means P is equal to zero I is equal to zero and similarly is D is equal to zero. The Proportional part deals with the force part. It describes the force of your robot for any tilt i.e how vigorously it's gonna react.

So, first of all, you have to keep increasing your P-value until your oscillating. It may even oscillate wildly or it may even oscillate like a drowsy lump. All you have to do is to make the boat oscillate. After that increase the Derivative value to reduce these oscillations. Your robot will stand a bit still not exactly still at the center point without oscillating much. Largest the value of D, quicker it will attain the steady state. Now increase the I value. I value will deal with the response to the tilt. It will tell the bot how quickly you want your bot to recover from a tilt.

The circuit for the bot is same as the previous one.


Working Updated

*Latest Updated* Source_Code
Ping me if you face any problem. PID tuning takes luck and not the time I guess. In my next post, I'll deal with PID automatic tuning for this bot. So Long.

Self Balancing robot is one awesome form of the subject Control Systems. It relies on the topic "Feedback Controller".It always feels awesome to work on these projects and to calibrate your bot according to your needs. Although calibration my bot took less time than to work with PIDs which can take ages and ages also. Auto-Tune PID is also available to manipulate your bot but still, hard work deserves more reward and peace.

In this project, I haven't used PID control to control the motion of bot. That doesn't mean that my bot is unstable or it will wobble around here and there. It is indeed stable and can recover from sudden jerks and movements too. I am not using the enable pins of L293D IC which is used to control the motors using PWM signal I am using here a soft coded PWM signal and decoding part of Arduino to determine the speed of the motor with the respect to the angle. Also, I coded a virtual encoder that can help my bot to return to its original place from where it started drifting.

This project is a complete do it yourself project. I've only used foam tape, hot glue gun, some rulers, etc. I've converted the micro servo into continuous rotation Servo since it has high torque but also one should note that it has less speed so better use large wheels in order to cover more distance.

I am using here
Two continuous micro servo 9g
GY-521 Accelerometer + Gyroscope
Arduino Pro Mini
0.1 uF Ceramic Capacitors
10 uF  Electrolytic Capacitors

Do remember that electrolytic capacitors have polarity and ceramics do not. Ceramic capacitors are being used here to nullify motor noise. Electrolytic capacitors are being used for appropriate filtering of power fed into L293D IC.

Some concepts of GY-521 -  Gyroscope never measure the actual angle of the tilt it actually measures the rate of tilting. Similarly, the accelerometer measures the acceleration in a particular direction basically GY-521 has 3 Axis accelerometer it can measure in X Y and Z-Axis. The gyroscope present on GY-521 can measure and X, Y, and Z gyro rate. We can either use Kalman filter or the complementary filter to nullify the peaks and valleys obtained from GY-521. The complementary filter is easy as it is just a one-line code as compared to Kalman filter. However, I got results that were more or less equal for both more filters.

I named my bot 'Po'. Yes, I love Kung Fu Panda.

Po works within 5 regions on each signed axis i.e 5 on the positive angle tilt and 5 on the negative angle tilt.
Within the first region (0, 0.8) and (-0.8, 0) degree, no power is supplied to the motor since the bot is perfectly stable.
Within the second region (0.8, - 2) and (-2, -0.8) degree, servos run @ 30% duty cycle.
Within the third region (2, 7) and (-7, -2) degrees, servos run @ 60% duty cycle.
Within the third region (7, 15) and (-15, -7) degrees, servos run @ 80% duty cycle.
Within the third region (15, 25) and (-25, -15) degrees, servos run @ 100% duty cycle.

Both the servos were attached using a ruler and foam tape like this.
A vertical section is added to the horizontal servo support. This vertical support will carry the circuit board.

Here is the circuit of the connection.

I am uploading my sketches using HC 05 Bluetooth. Better prefer a bluetooth module with a STATE pin present on it. Serial Monitor also works using bluetooth.

Currently I am working on an another way of controlling the bot using an equation of the damped parabola. I will upload its results soon. To increase smoothness we will have to increase the regions and change motor duty cycle. Then we can map the duty cycle to the angle of Po to get an equation. Sounds a tough job but is far better than trial and error of PID setting if you ask me.

Another approach I am thinking about is using state space. I have used servos here because plastic DC motors have a backlash error that will always make an error of 0.5 degrees which will keep your bot vibrating at the single point.

I have used an approach to change the balance angle in this bot as it moves. Whenever forward() is instantiated, a counter named "f" is incremented. Similarly, a counter "b" is incremented whenever backward() instantiated. A separate counter is run inside the main loop which checks "f" and "b" after every 1200 cycle. If "f" is greater than "b" setpoint is decreased by 0.05 degrees. If "b" is greater than "f" then setpoint is increased by 0.05 degrees. Then it will reset the main loop counter along with "f" and "b" too. Th basic problem of self-balancing bots without encoder is that they don't return back from where it started. Although Po won't return to its exact point of location, it will return to its nearest point as long as both motors rotate at the same rate which rarely happens with L293D IC.

A point of note - Better use L298 or L293 IC as both have the greater amount of current supply than L293D IC. Also, use a 0.1nF ceramic capacitor across the motor terminals in order to nullify the reverse current spikes. Whenever your motor power stops, it will still rotate due to inertia and will act as a generator of better say back EMF. This sudden drop is the spike and will keep resetting your Arduino, Bluetooth or even Gyro. In my case, it always resets the bluetooth module.

Here is the code for Po.

Detecting Objects using YOLO

This post deals with my small project on YOLO. It is a great project which if linked with an Arduino will certainly make you win Google Science Fair. Pardon ๐Ÿ˜

It also enables to localize the object. If you lost your specs then maybe this will certainly work.

So YOLO stands for "You Only Look Once". Yes, YOLO looks at the image only once. It works by dividing the image into K x K cells

A bit like this
Fig 1 - Image divided into cells

Before working on YOLO have a look at its output like this when I ran an edited version of YOLO over the above image.
Fig 2 - YOLO Output using classification.

Each of these yellow boxes is called bounding box in YOLO language. Each cell in Fig 1  will generate bounding boxes. Treat each image cell as an individual cell and a CNN is run over that image to extract out the features from it. If that feature is significant then a bounding box is drawn over that portion of a particular cell with a bounding information or confidence score.

Higher is the significance, higher is the boundary information or confidence score.

Bounding information is the thickness of the bounding box. More significant items get a thicker boundary box. On the contrary, less significant items get a thinner boundary. When these cells are merged then all boundary boxes with approximately same boundary information gets converted into a bigger boundary box called as boundary box group. However, this boundary box does not classify any object. It just provides the significance score. This process continues and the result is the Fig 2 or it can even be Fig 3 below.

Fig 3 

Back to work now.
If you are imagining the boundary score then prefer the below image.

Fig 4 Confidence Score a.k.a Boundary Information

The boundary boxes with a higher score are used for classification. So first we find out whether there is a boundary box present, second, it predicts the class of the information inside the bounding box. YOLO can detect up to 20 different objects. Some of em are dogs, person, cars, traffic lights etc.

Now YOLO combines the results of image classification and marks the boundary group which contains the complete object. After this, only those boxes are kept whose box information is highest i.e it represents a full object inside or at least more than 80%. Rest other insignificant boxes are removed.

The result is in Fig 2. 
Every bounding box has 5 parameters namely x, y, w, h and its score. x and y is the center of the boundary box within the cell. w and h is the width and height of the boundary within the cell respectively. So if we feed an image in to Tensorflow, we would get 
K * K * (B * 5 + C) tensors. C is the total number of classes. K is the number of cells. 

To begin our own we need
  1. A long nap
  2. Microsoft Visual Studio 2k15 Click here to download or here.
  3. OpenCV 3.0 
  4. CUDA 8.0 Click here to download

If you have the 2nd item then I guess you can move over the 1st item or else put the 2nd item on download and follow the 1st item. 

Install Visual Studio in "custom" mode. Then select Visual C++ in programming languages and also Common Tools.

Then clone the following repository.

Extract the folder in the default python folder of your OS. My extracted folder name was "Darkflow-masters". Do check yours. It may differ

Now open Command Prompt as admin and cd to your scripts folder. The script folder lies inside Python folder. My python folder's location is E:\PyPy so to cd there I opened CMD as an admin. The below screenshot shows how to reach to your scripts folder.

Now type 
pip install cython
pip install tensorflow

After the above process, locate your darkflow folder you just extracted a few moments ago using cd.

Then type 
python build_ext --inplace

Since my python.exe was not in the same location as (lies inside darkflow folder) therefore I typed the following. (Yours will differ)

E:\PyPy\python build_ext --inplace

Somewhat like this 

Now to run YOLO we have the syntax
A. To run on CPU (Yeah it sucks!)
python flow --imgdir (Sample Image directory) --model (cfg directory) --load (weights directory)
B. To run on GPU (Life is sweet. More sweet if GPU is Nvidia)
python flow --imgdir (Sample Image directory) --model (cfg directory) --load (weights directory) --gpu 1.0

Command I used on my computer. Of course yours will differ
E:\PyPy\python flow --imgdir sample_img/ --model xx/tiny-yolo-voc.cfg --load xx/tiny-yolo-voc.weights --gpu 1.0

Here are my outputs

Error - cl.exe not found
Solution - Re-install Visual Studio

Error - AssertionError: Over-read bin/yolo.weights
Solution - You have used the wrong cfg with wrong weights.

Error - ImportError: No module named 'darkflow.cython_utils.cy_yolo_findboxes'
Solution - Use this command build_ext --inplace as I have instructed.

Error - No cv2 module found
Solution - pip install opencv-python

ErrorAssertionError: expect xyz bytes, found abc bytes
Solution - You used the wrong cfg with the wrong weights again!

Hola Amigos,

I actually faced a lot of issues while building my own classifier to build my own neural network classifier. Step - by - Step explanations are hardly available coz everyone here thinks that we already know 30% approx about the subject but what about a beginner? Let's face it it's very difficult to understand these available examples, especially for new fledglings. Therefore I decided to follow the new way of learning and that is Reverse Engineering. Take an example and crack it down and then use that example as a reference to crack every other example. I did a similar thing a long time ago to learn to programme and did it again to make my own classifier that can identify a man and a woman.

To learn about CNN Click Here

The blueprint of a neural network classifier is as follows

  1. Specify a directory of your images for training
  2. Specify a directory of your images for validation
  3. Make a Convolutional Neural Network with input dimensions according to image dimensions.
  4. Add two hidden layers. Actually even 1 will work to some extent.
  5. Convert the image input to a format readable by the neural network
  6. Convert the validation input to a format readable by the neural network
  7. Set a learning rate, epochs, steps per epoch
  8. Save your model and retrain on different sets of examples and datasets 
Ok enough said. 

I am using here Keras to build this classifier. Keras is damn easy, believe it. I am pretty sure you have heard about the above steps but when the coding part comes, our mind starts cracking like, what is this "Google It", What is that "Google It".

I am using here Spyder 3.x Anaconda as my coding platform. Feel free to choose yours

Do open Anaconda Prompt first. Then type

conda pip install keras

The first step is to import the libraries which shouldn't be hard.

from keras.models import Sequential

We need Sequential to build the neural network,

from keras.preprocessing.image import ImageDataGenerator

ImageDataGenerator to convert our directory data into Keras neural network readable format. It is like when you eat Doritos, your stomach breaks it down with Hydrochloric Acid in order to process it. The intestines can only extract energy from that broken food and not directly from Doritos. I hope it is clear now :)

from keras.layers import *

Next, we need Conv2D to convert the image into arrays of data, MaxPooling2D to downsample the converted data. So the question is why do we need MaxPooling? How fast can you solve 10 variable equation? It will certainly take a long time. Similarly, greater the number of parameters, the longer will be the time taken to process the parameters. I hope this will make it clear that parameters are directly proportional to the complexity.

Then we need a dense layer to connect the layers. Dense and fully connected are different names for the same thing. This is what I realized. Dense layers take care to connect every input with every output adjoining them with a set of weights.

Click here to know about dense & fully connected layers.

Now specify the directory where you have stored the images for training and also the location for validation. Remember you just have to specify the directory containing the folders of your classes.

I had two folders namely men and women. They were located in train folder and the train folder was located in gathered_data folder. So just provide the location of the parent directory holding your classes.

Similarly, specify you validation data directory. The images for validation will remain separated from the training data in order to check the accuracy of our NN.

Now specify your epochs and batch size. One epoch is 1 forward+back propagation. Batch size is the number of samples that will be sent at a time to the network.

Now specify the height and width of the image. One can relax as Keras helps us to resize every image into the required size. Basically, small size images have fewer parameters thereby reducing the complexity of the neural network.

Now we will build our neural network. The syntax model = Sequential() calls the Sequential API which has the methods of convolution and pooling.

model.add(Conv2D(32, (3, 3), input_shape = (width, height, 3)))

model.add adds a layer in the network. Conv2D is the convolution layer method. so model.add(Conv2D) adds a convolution layer. (32, (3, 3)) adds 32 output filters with stride size 3 X 3.

Click here to know about filters.

input_shape is the shape of the image that we are providing the input. Do remember that only this convolution layer has this image input. All other layers have the input connected to the output of the previous output. In the syntax (width, height, 3) we know about width and height but what about the '3' here. It is the number of channels here. Every image is in RGB model. So we have three colors hence the third dimension is 3.

Keras has two methods. The first is channel_first which means the input_shape will be (3, width, height)
The second method is channel_last which means the input shape will be (width, height, 3). I guess English is enough to understand this. So we are following channel_last method.
Next, we will choose our activation formula. We all started with sigmoid but we can choose other methods too provided by Keras. I am choosing relu as my activation function. So after the update and summation of wights and data, it will be passed through the activation function. After passing through the activation function we will minimize some parameters by pooling. Thus I have used MaxPooling here. Average pooling is also available here but accuracy and output of Max Pooling are enhanced with Max rather than with average pooling.

The size of pooling layer is 2 X 2.

Click here to know about Pooling

Similarly, I have added a third layer too. So a quick way through will be like this

Image Credits - Mathworks

So it goes like this

Th last line compiles the neural network model into a single package with extra parameters like the loss, optimizer, and metrics.
Summary of what we did till now -

  • We imported the libraries of Keras for convolution, sequential and image generator
  • We specified the whereabouts i.e directory of our images for training as well as validation
  • We made our neural network design.
Coming to the term loss = 'binary_entropy'. Binary Entropy is the loss function. Let us consider x as the actual answer and y as an ideal answer. So the loss can be deduced by the simple formula
Loss = x - y
Clearly, the lower is the loss, the better our network gets. So in order to minimize the loss, the CNN has to deduce a function to minimize the loss of maximum efficiency. Smaller values of cost function point to better network fit and vice-versa. Binary entropy is modeled on a variable which can have only two values, 0 and 1. So if the probability of 1 is 0.4 then the probability of 0 will certainly be 0.6. The binary entropy follows a graph like below

Fig - Binary Entropy Function

We are using binary entropy here because we have two classes. The result has to be either a man or a woman. Hence the probability will either flow towards man or woman. One must remember that binary cross entropy is a special case for categorical cross entropy where if you are having two classes then you are using the binary cross entropy form of categorical cross entropy.

Optimizer depends on the neural network density. For deep networks, Adam or root mean square is used. Since we are having 2 hidden layers with a good number of neurons, I would stay with Adam. 

Coming to the final part of the code.

inp_data = ImageDataGenerator(rescale = 1./255, shear_range = 0.2, zoom_range = 0.2)

Image Generator will rescale the image. The shear range will shift the image by a factor of 0.2 and will zoom the image by a factor of 0.2 in order to provide vibrant data and prevent overfitting.
Similarly, the validation data is also rescaled and flipped horizontally randomly for better fitting. 
You can use various features for this too.

Now to process the image from the directory and passing it through data generator we use flow from directory. It will grab the images from the directory and apply inp_data process on it. 
Carefully see the line 4 of the above image. The function flow from the directory will require the input image directory, the target size to convert image size, batch size, and the class. After conversion of the input images, it will be stored in input_data. Similarly, valid_data will store the validation data. 

Finally, the CNN model is compiled by passing the input data, epoch size, valid_data, steps per epoch.
Steps per epoch are equal to the number of samples divided by the batch size. Similarly, validation_steps is equal to the number of validation samples by batch size.
Batch size is the number of samples taken at once inside the NN.
One forward propagation and one backward propagation of every example is equal to one epoch.
Iterations are the number of passes required to complete 1 epoch.

Example Take 100 images and batch size is 50 then it will take 2 iterations to complete 100 images which is equal to 1 epoch.

In this post, I will deal with back propagation, gradient descent etc. Do check out my previous posts regarding Max Pooling, filters, dropout layers, fully connected layers and CNN. To begin click here.

What is Back Propagation?

Backpropagation is done whenever we find some error. After obtaining the probabilities of the images the CNN subtracts the actual answer with the obtained answer. Clearly, this is an error. So our first objective is too reduce this error. While I was working with the probabilities with the image of the letter 'R' (click here to know about the image 'R'), I got the probability of R as 0.911 and the probability of some other is 0.42.

Actual probability of getting a 'R' is 1 thus, the error is |1 - 0.911| = 0.081
The actual probability of not getting a 'R' is 0 thus, the error is   |0 - 0.42| = 0.42.
Thus the total error is 0.501. Now comes the power of weights. These weights are used in the gradient descent optimization algorithm which is used to find the minimum of the function. These weights are kept on adjusting along the gradient descent and each and every time the slope is chosen which produces the minimum error (like here we have 0.501). This slope is then fed in the equation which is evaluated to find the weights and that weight is multiplied with the input coming from every feature of the neuron (values) and then added with the bias to get the final value. This whole operation is performed several times to update the weights with appropriate values and updated within the network.

Consider the following image below
Fig 1- Gradient Descent
The red dot moves across the graph and tries to find the minimum error with the weight axis. This is performed for every neuron pair and their weights are then noted down. After finding the appropriate weights for every neuron pair, these weights are updated with their corresponding neuron pair and the process of NN is carried out again and again. 

This gradient function is also known as the cost function if you are aware of that. Our main objective for that cost function is to minimize. In gradient descent, we jump or take steps towards negative side i.e downwards. The opposite is the case of gradient ascent where we jump towards up to find the maximum. Well, the good thing is that these objectives are executed by the NN itself automatically to gather the best parameter but there are also various parameters where it cannot be automatically done. These include the number of hidden layers, choosing the activation function, the right amount of stride, choosing the matrix size of the pooling layer, number of epochs, batch size and various other parameters.

All these can be achieved by trial and error as I do the same way to find my best NN. Although this hurts because to run a NN my Core i3 CPU (3rd Gen ๐Ÿ˜‘) it takes around 3 hours to find the appropriate parameters. Well, hard works do pay off so keep on trying and explore the world of NN.

Cheers ๐Ÿ˜ƒ

In this post, I'll deal with dense layer, fully connected layer and backpropagation. If you have missed my previous post click here.
Before moving further, let us have a view on the filter's working on an image. I made a pixelated image of the letter 'R' and applied a 3 X 3 filter one time and 3 times. One must remember greater is the number of times a filter is applied on the same image, lessened will be the features. The pixelated image of the letter 'R' and the filter is below

Fig 1- Filter (Left) and the image (Right)

So when this filter is moved across the image we get feature extracted according to the filter like this below
Fig 2- Filtering applied on image(once and twice)

One can clearly see how the filter has faded the pixels that aren't in phase with the filter. The term 'phase' seems to fit here ๐Ÿ˜Š. The darkened cells are ones in phase with the filter. Even if the reference image is a bit distorted or rotated or flipped or sheared the feature will still get picked up as it will be in phase with the filter.

The process is same for the filter for calculating the filtered value. It is the sum of the products of the value of 1s and 0s with the actual cell value. The sum is then divided by the total number of cells in the filter matrix (Here 9). The value gets faded if its closer to 0 and it gets darkened if it gets closer to 1. 

Rectified Linear Units ( RELU)

Now this filteres image is passed through rectification layer. We got the maximum value of cell in the below image is 0.33 and minimum value of cell as 0.11 so the middle value is 0.22. So the rule is stated as change the values of cells to 0 whose values are less than 0.22 (you can choose different values. Value closer to the maximum will be hard and fast process for the image and choosing a value closer to minimum will include some unnecessary features). So after passing the second image from Fig 2 above through the RELU layer we get an output like this

Fig 3- RELU layer applied to the filtered layer
After the application of RELU layer the image is passed through the pooling which I have already discussed in this post. So an overall process would be like this
  1. Convolution (Filtered)
  2. Relu activation
  3. Pooling
The above process is done once or twice or can be even greater. 

What is a fully connected layer?

A fully connected layer is a single row of all the neurons connected together where every cell indicates the probability towards the actual answer. In the rectified image of 'R' darker cells have a greater value while the faded cells have a low value, thus one can conclude that the darkest cell has a greater probability to be in phase with that filter and cell having a faded value or having a lower value has quite less probability to be in phase with the filter. The above rectified image is the filtered output of a single filter and we have various filters that extract out the features so in each filter. 

Now every cell value is laid down in form of an array. This process is carried out for every filter. Now an average is taken among every dark cell. Similarly average is also taken for every faded cell. These average indicate the probability. 
The average of the darkest cells from each filter shows the probability of how close is our image (to 'R' here). On the contrary the average of all the lighter cells from each filter shows the probability of how far is our image (from 'R' here)
Fig 4- Fully Connected Array of all neurons (cells)
Like the above image when all arrays obtained from each filter is connected we get our fully connected layer. 

What is a dense layer?

A dense layer is just another name of fully connected layer. Similar operations take place in dense layer where every neuron is connected with each other. It is also called dense because it represents a dense connection of dense neurons. A dense layer has weights associated with every neuron pair and with unique values. Generally in Keras you may notice dense layer when working with CNN while in Tensorflow you may find fully_connected. So do not get confused with this. In keras we often use dense as Dense(10), Dense(1). Here every neuron among 10 neurons is connected with the last neuron and with unique weights. Since this is too dense and I don't think it's harmful to call it a dense layer. ๐Ÿ˜‰

What is a dropout layer?

As per the name suggests, it dropouts or better say, it eliminates some of the activated cells (cells passed through activation layer). This has to be done in order to prevent over-fitting. An over-fit network will not be able to distinguish features from different image of the same object. The CNN has to work within a robust environment hence dropout becomes necessary. Dropout is basically chosen between 0.2 to 0.8. Dropout removes the  neurons randomly based on the parameters provided by the user like 0.4 etc.

Consider an example where you have 20 cookies and 8 of em are halftimes. An overfit network will only recognize the fully circle cookies. A fledgling CNN will pick up almost every cookie among the 8. With dropout some cookies which are either circular or halftimes will be dropped a.k.a removed randomly and retrained. This increases the quality of the network.

Cheers, See ya all soon. 

This is my second post in CNN regarding max pooling, strides and padding.

In the previous post we extracted the features from the image of '3'. Although the dimensions of the image were 4 X 24 which is quite small. But what to do when the size of the image is big having a very high resolution. Greater is the size of image more are the parameters which the CNN has to extract from the image and hence it will take a longer time to identify the class of that image so how do we reduce the size of an image without losing any details from it and also maintaining the spatial arrangement. The only thing that comes at this position to solve this problem is Max pooling or Average pooling. Pooling is basically a technique by which you reduce the size of the image but also maintain the details and features within it in order to lessen the parameters. However, with pooling in the images might guesstimate updated started but still gets picked up by this CNN so that good news

Max Pooling

In Max pooling we choose the maximum value within a matrix. The size of the matrix could be 2 X 2 or 3 X 3 also. Here is an image showing the max pooling of the reference image of 3

Fig - Max Pooling of reference image of '3' with stride 1

As you can see with max pooling size of the resultant image gets reduced and also retaining the image information.
 Here is an explaining the process of max pooling.
I have used 2 X 2 size pooling that extracts the maximum value of cells. The size also gets reduced.

Initial Size of matrix 4 X 6. Pooled image is 3 X 5. Notice that the stride here is 1.

 What is Stride?

Stride is the number of steps that the pooling matrix will jump. Notice in the GIF image above, the matrix jumps only 1 step horizontally as well as vertically. 

Here is an example showing max pooling with stride 2

Fig - Max Pooling with stride 2
Another example with stride 2

Average Pooling

Average pooling does the same job like max pooling but instead of calculating the maximum value, it takes the average value within the matrix. 

Fig - Average Pooling with stride 1

Average Pooling retains a bit less information when compared with the max pooling. Average Pooling is somewhat less accurate than max pooling.


We noticed that with pooling our parameters has been reduced as well as the image size. It is important to maintain the spatial arrangement of the image. So to lessen the parameters and also maintaining the image size we use padding.
Padding is the technique in which we add a row and a column on zero around the image matrix. When this image is pooled it retains the information with the same size and fewer parameters.

Fig- Max Pooling with padding to retain the image size

Another example showing Average Pooling with padding and stride equal to 1
Fig - Padding with Average Pooling

✌ Image Size Retained!!! ๐Ÿ˜…

In the next post, I will deal with fully connected layer, dropout, and dense layer.

Every time I imagine CNN something spills out from my brain and forces me to restart my learning. I guess it was because I wasn't doing practicals on CNN. Many guys basically look on CNN as a theory and that is where even I lost my way of learning. However, Coursera, Edx, and Udacity helped me to amplify my knowledge about CNN and those big words like pooling, strides etc. I am not a genius in CNN but yes I know something about CNN.

So What is CNN?

Convolution Neural Network is a branch of AI where features from images are gathered up and compared with the input data. It is basically a voting system where every pixel votes for the outcome and as usual the one with maximum votes win in this game and we get a result like this

So how does this happen is what comes in my mind first.

A CNN takes an image as input and converts them into arrays. Yes those numpy arrays are for the same !! Do remember that a CNN never matches the whole image instead it matches small features of images with the input image.
So let us pass an image of a number '3' to our CNN. The CNN will look at the image like this

The human eye can clearly see the digits 3 being displayed. However, it's not so easy for the machine to see the image. Now to extract out the details this CNN will multiply a weight Matrix to the above mattress that represents the number 3. After multiplying the weight Matrix the result will be like this

Fig - Reference Image of 3 to classify

Comparing both the images side by side you will recognize that the following image represents more features and detail than the previous image

 Every cell is multiplied with a weight which enhances the features of the image for better recognition. Now in my post of Image Classification, I used the syntax
 model.add(32, (3, 3), input_shape = (3, width, height))).

Here we have 32 filters of size 3 X 3. 

 *** To clear out the confusion I will say it again. The CNN has been trained on a variety of images of 3. It now knows the features so what CNN does is that it will grab a filter with "trained" details and will match "that with the validation data" i.e the above image of '3' is an image that we want to check. The network hasn't seen this image before. The filter contains the cell data from the trained image. ***

So here I have taken a 3 X 3 filter that knows the feature of '3' which will repartition the cell by a rule which state mark all cells 1 whose value is greater than 128 and 0 whose value is less than 129. Thus our filter belonging to the top left corner will represent like this

Now, this filter will move across every part of the reference image ('3') matrix to match and find the feature pixel by pixel.
So how it's done?
Each cell content from the filter will be multiplied with the similar cell in the image that we want to classify. 

After multiplication and addition, we get something like this

 Moving this filter at every step across the reference image we get the following information 

Voila !. The filter matches itself with every part of the reference image and outputs the probability of value in each cell. Do notice that the fourth column of reference 3 image has been neglected by the filter in the fourth column of the filtered image. It is because our filter is 3 X 3 thus square in shape and every 3 X 3 matrix in the reference image has to match this filter whereas the fourth column doesn't. Similarly, we place 32 filters across the reference image and extract out the features. Every filter will be from a trained image and it will move across the reference image and compare each pixel. The pixel which passes through the filter gets a higher probability thus, it gets darker. On the contrary, the pixel which cannot pass the filter is lightened. 
We can clearly see how the red circled pixels match the reference image and also the filter. 

After applying 3 more filters you will get this as the output
This methodology of stacking various filters containing a bunch of features in them, over an image is called a Convolution Layer. Thus each image is a stack of various filtered images. Moving the filter across the whole image, we get the information about the location of the pixels. 

Coming to the input shape of the image, the input shape is represented in Keras as (3, width, height) or by (width, height, 3). So to find the width just calculate the no. of pixels horizontally of the reference 3 image. You will get 4. Similarly calculate the no. of pixels vertically. You will get 24. Now each pixel is represented in form of RGB. Thus, it has 3 channels. So our input shape for the CNN will be (3, 4, 24) or (4, 24, 3). 

I will deal with Max Pooling, Padding and Strides in the next post.

Cheers. ๐Ÿ™‚

In this post, we will implement Q learning to play Pong.
By the end of this post, you will be able to

  1. Design your own game in Python Pygame library. 
  2. Learn the basics of Q learning
  3. Implement an efficient Policy for the agent

To follow this tutorial it is highly recommended to have even a little bit of experience in

  1. Python
  2. Backpropagation 
  3. Linear algebra 
  4. Matrices. 

If you know the basics of these then we can move on.

I am using Python 3.5 and the software I am using for the coding part is Sublime Text 3 but you can even use the default Python IDLE editor

Before starting we need to install the pygame library. To do that just open the Python folder where it is installed then go to the scripts folder, and open command prompt from that location.

Now type this below

  pip install pygame 

Let it download first then type

  pip install numpy 

Let's go to the problem solving

The pong game basically has a rectangular bar with which we will have to bounce the ball everytime it tries to hit. If it misses then the reward will be -1 else + 1

from pygame.locals import *
This imports all the packages from the pygame library

import numpy as np
This imports the numpy library and renames it to 'np' for easy coding.

import pygame as pg
This imports the pygame library and renames it to 'pg' for easy coding.

import random
This imports the random library inorder to generate some random numbers.

import time
This imports the time library which I will use here to calculate the time taken to learn from experience.

start = time.time() 
The variable 'start' is storing the initial time at which the script was loaded.

FPS = xxx
A high value of FPS will make the game faster and a low value will make the game slower in terms of frames. Having a high FPS will make your agent learn in less time in case you lack patience ;)

fpsClock = pg.time.Clock()
It creates an object which keeps an eye on the time of the system.

This initializes the pygame module

window = pg.display.set_mode((800,600))
It will create a window container with height 800 pixels and width 600 pixels. Change according to your desire.

pg.display.set_caption('Q learning Example')
It will display 'Q learning Example' on the title bar

Left = 400
The co-ordinate of the left surface

Top = 570
The co-ordinate of the top surface

Width = 100 
Width of the rectangular bar

Height = 20
Height of the rectangular bar

LR = 0.01
Y = 0.99
Learning Rate and Gamma

Black, White, Green
RGB values of black white and green colour

rct = pg.Rect(Left, Top, Width, Height)
It creates a rectangular object from the pygame library and stores the coordinator as specified by the left, top, width and height.

storage = {}
It will store the value of each state.

action = 2
It defines the action of the agent. 2 stands for right 1 stand for left and 0 stands for rest

jumpY = 6
jumpX = 8
Number of pixels the agent will jump to the horizontal x-axis and according to the vertical y-axis

Q = np.zeros([25000, 3])
This creates a numpy array with 25000 rows and 3 columns. Each of the three columns define the action and each of the row defines the state. Each column stores the maximum Q value respective to the action according to the state

cenX = 10
cenY = 50
radius = 10
score = 0
missed = 0
reward = 0
CenX and CenY will store the coordinates of the centre of the circle. Radius for radius of the circle and rest is for the score, reward and the number of times the rectangular bar has missed the ball as 'missed'.

The calculate_store function will calculate the reward and return 1 if the ball is on the rectangular bar or else it will return -1 if the rectangular bar fails to deflect it. Whenever the rectangular bar message the ball the game will regenerate the ball at random location and that random location specifically for the x-axis is determined by the newXforCircle function.

The class state stores the location of the rectangular bar it consists of the general information about its coordinates and also of the coordinates of the circle. The class Circle stores the coordinates of the circle centre of the circle.

The convert function will convert the state into a number and this number will be stored as the index in the numpy array Q among the 25000 rows. The max function returns the index of the maximum value present in that storage.

The action function returns the index that contains the maximum value of a particular action (0, 1, 2) for the agent. The argmax function will return the indices of the maximum values along a certain axis. The afteraction function intakes in the current state and the action that has been taken on that state and returns the next state. For example, if the rectangle's coordinate is 200 on the x-axis and the action is 2 to move right then int the next state it will be 200 + 100 which is 300.

The newRect function will return a new rectangle with updated coordinates based on the current action taken. If the rectangle is at the edge of the right border of the window (800) then it will return the original rectangle else it will return an updated rectangle that has moved 100 pixels to the right. Similarly, if the rectangle is at the edge of the left border of the window (0) then it will return the original rectangle or else it will return an updated rectangle

Quite Simple isn't it? :)

Now coming to the training and the infinite loop part. Hold your horses for it's a bit long.

#The for loop at line 2 must be present, whenever you are making a game using Python
#library np.savetxt(), which saves the Q values matrix. COLL stores the random
#RGB values of the ball which will change whenever the ball will strike the
#rectangular bar.
#Window.fill() fills the entire window with a certain RGB colour value
# The If-else loop describes the action that will be taken whenever the ball hits
# any of the edges. It includes the top, bottom, left side (0 pixels) and right
# side (800 pixels). It basically defines the behaviour of the ball a.k.a how it
# should jump and in which direction it will jump by updating the values of the
# rectangle and the circle a.k.a by calling the respective functions
#The Q function is the engine that is working here it is the most important
# part that one must cover during Q learning the equation of Q learning
# follows Bellman equation of probability.

#It States

Q(s, a) = Q(s, a) + lr*[R + y*max(Q(s', a')) - Q(s, a)]

# where Q(s, a) is the current state
# lr is the learning rate
# y is the gamma
# R is the immediate reward of that action
# s' and a' represent the next state and it action

Take an example where the rectangle coordinates are

Left = 400 Top = 400 Height = 30 Width = 100

This will be stored in the state class in the self.rect variable. Similarly, the centre coordinates of the circle will be stored in variable in the class state. Then this state is converted into a number i.e each state is assigned a number.This number is the index in the Q table. Hence whenever the agent faces certain state which is already in the Q table, it will then calculate the argmax of that row and return the index with maximum Q value. The action (Q table column) having maximum value gives the agent information about the reward it has yet received in that state by taking that action. So it is pretty easy to understand that the maximum value reflects the maximum reward with that action.

For the full code click here

Eva :)