Showing posts from May, 2018

SARSA Learning with Python

I worked on SARSA algorithm as well as on Q Learning algorithm and both of them had different Q matrix (Duh!) The methodology of both of the algorithms depicts how well one algorithm responds to future awards (which we can say OFF Policy for Q learning) while the other works of the current policy and takes an action before updating Q matrix (ON Policy).
The previous post example of the grid game showed different results when I implemented SARSA. It also involved some repetitive paths whereas Q didn't show any. A single step showed that SARSA followed the agent path and Q followed an optimal agent path.
To implement both ways I remember the way of pseudo code.


initiate Q matrix.
Loop (Episodes):
   Choose an initial state (s)
   while (goal):
   Choose an action (a) with the maximum Q value
   Determine the next State (s')
   Find total reward -> Immediate Reward + Discounted Reward (Max(Q[s'][a]))
   Update Q matrix
   s <- s'
new episode



Q-Learning with Python

Currently, I am working on learning algorithms in Data Science for robotics. Reading many examples online and trying them on my own gives me a feeling of reward. I got deeply fascinated by Q learning algorithm based on the Bellmans equation. I also made a Pong game using Q learning. You can view that project on my instructable.
It didn' take much time to understand the working of Q learning. It appeared similar to the State Space matrix that I studied in my Control Systems class in college which I have forgotten now. However, seeing a practical application makes it easier to learn.
Q-Learning is based on State-Action-Reward strategy. For example, every state has various actions that can be implemented in that state and we have to choose the action which returns maximum rewards for us.
The agent will roam around like a maniac at the start and learn about its actions and rewards. The next time when the agent faces the similar state, it will know what to do in order to minimiz…