### Deep Reinforcement Learning -Write an AI to play Pong with Q learning

In this post, we will implement Q learning to play Pong.

By the end of this post, you will be able to

- Design your own game in Python Pygame library.
- Learn the basics of Q learning
- Implement an efficient Policy for the agent

Important

To follow this tutorial it is highly recommended to have even a little bit of experience in

- Python
- Backpropagation
- Linear algebra
- Matrices.

If you know the basics of these then we can move on.

I am using Python 3.5 and the software I am using for the coding part is Sublime Text 3 but you can even use the default Python IDLE editor

Before starting we need to install the pygame library. To do that just open the Python folder where it is installed then go to the scripts folder, and open command prompt from that location.

Now type this below

**pip install pygame**

**Let it download first then type**

**pip install numpy**

Let's go to the problem solving

The pong game basically has a rectangular bar with which we will have to bounce the ball everytime it tries to hit. If it misses then the reward will be -1 else + 1

__from pygame.locals import *__This imports all the packages from the pygame library

**import numpy as np**This imports the numpy library and renames it to 'np' for easy coding.

__import pygame as pg__This imports the pygame library and renames it to 'pg' for easy coding.

__import random__This imports the random library inorder to generate some random numbers.

__import time__This imports the time library which I will use here to calculate the time taken to learn from experience.

**start = time.time()**

The variable 'start' is storing the initial time at which the script was loaded.

__FPS = xxx__A high value of FPS will make the game faster and a low value will make the game slower in terms of frames. Having a high FPS will make your agent learn in less time in case you lack patience ;)

**fpsClock = pg.time.Clock()**It creates an object which keeps an eye on the time of the system.

**pg.init()**This initializes the pygame module

__window = pg.display.set_mode((800,600))__It will create a window container with height 800 pixels and width 600 pixels. Change according to your desire.

__pg.display.set_caption('Q learning Example')__It will display 'Q learning Example' on the title bar

__Left = 400__The co-ordinate of the left surface

**Top = 570**The co-ordinate of the top surface

__Width = 100__Width of the rectangular bar

__Height = 20__Height of the rectangular bar

__LR = 0.01__

__Y = 0.99__Learning Rate and Gamma

**Black, White, Green**RGB values of black white and green colour

__rct = pg.Rect(Left, Top, Width, Height)__It creates a rectangular object from the pygame library and stores the coordinator as specified by the left, top, width and height.

__storage = {}__It will store the value of each state.

**action = 2**It defines the action of the agent. 2 stands for right 1 stand for left and 0 stands for rest

__jumpY = 6__

__jumpX = 8__Number of pixels the agent will jump to the horizontal x-axis and according to the vertical y-axis

__Q = np.zeros([25000, 3])__This creates a numpy array with 25000 rows and 3 columns. Each of the three columns define the action and each of the row defines the state. Each column stores the maximum Q value respective to the action according to the state

__cenX = 10__

__cenY = 50__

__radius = 10__

__score = 0__

__missed = 0__

__reward = 0__CenX and CenY will store the coordinates of the centre of the circle. Radius for radius of the circle and rest is for the score, reward and the number of times the rectangular bar has missed the ball as 'missed'.

The

**calculate_store**function will calculate the reward and return 1 if the ball is on the rectangular bar or else it will return -1 if the rectangular bar fails to deflect it. Whenever the rectangular bar message the ball the game will regenerate the ball at random location and that random location specifically for the x-axis is determined by the

**newXforCircle**function.

The class

**state**stores the location of the rectangular bar it consists of the general information about its coordinates and also of the coordinates of the circle. The class

**Circle**stores the coordinates of the circle centre of the circle.

The

**convert**function will convert the state into a number and this number will be stored as the index in the numpy array Q among the 25000 rows. The

**max**function returns the index of the maximum value present in that storage.

The

**action**function returns the index that contains the maximum value of a particular action (0, 1, 2) for the agent. The

**argmax**function will return the indices of the maximum values along a certain axis. The

**afteraction**function intakes in the current state and the action that has been taken on that state and returns the next state. For example, if the rectangle's coordinate is 200 on the x-axis and the action is 2 to move right then int the next state it will be 200 + 100 which is 300.

The

**newRect**function will return a new rectangle with updated coordinates based on the current action taken. If the rectangle is at the edge of the right border of the window (800) then it will return the original rectangle else it will return an updated rectangle that has moved 100 pixels to the right. Similarly, if the rectangle is at the edge of the left border of the window (0) then it will return the original rectangle or else it will return an updated rectangle

Quite Simple isn't it? :)

Now coming to the training and the infinite loop part. Hold your horses for it's a bit long.

#The for loop at line 2 must be present, whenever you are making a game using Python

#library np.savetxt(), which saves the Q values matrix. COLL stores the random

#RGB values of the ball which will change whenever the ball will strike the

#rectangular bar.

#Window.fill() fills the entire window with a certain RGB colour value

# The If-else loop describes the action that will be taken whenever the ball hits

# any of the edges. It includes the top, bottom, left side (0 pixels) and right

# side (800 pixels). It basically defines the behaviour of the ball a.k.a how it

# should jump and in which direction it will jump by updating the values of the

# rectangle and the circle a.k.a by calling the respective functions

#The Q function is the engine that is working here it is the most important

# part that one must cover during Q learning the equation of Q learning

# follows Bellman equation of probability.

#It States

**Q(s, a) = Q(s, a) + lr*[R + y*max(Q(s', a')) - Q(s, a)]**

# where Q(s, a) is the current state

# lr is the learning rate

# y is the gamma

# R is the immediate reward of that action

# s' and a' represent the next state and it action

Take an example where the rectangle coordinates are

Left = 400 Top = 400 Height = 30 Width = 100

This will be stored in the

For the full code click here

Cheers,

Eva :)

**state**class in the**self.rect**variable. Similarly, the centre coordinates of the circle will be stored in**self.circle**variable in the class**state.**Then this state is converted into a number i.e each state is assigned a number.This number is the index in the Q table. Hence whenever the agent faces certain state which is already in the Q table, it will then calculate the argmax of that row and return the index with maximum Q value. The action (Q table column) having maximum value gives the agent information about the reward it has yet received in that state by taking that action. So it is pretty easy to understand that the maximum value reflects the maximum reward with that action.For the full code click here

Cheers,

Eva :)

## Comments

## Post a Comment