Skip to main content


Showing posts from 2018

SARSA Learning with Python

I worked on SARSA algorithm as well as on Q Learning algorithm and both of them had different Q matrix (Duh!) The methodology of both of the algorithms depicts how well one algorithm responds to future awards (which we can say OFF Policy for Q learning) while the other works of the current policy and takes an action before updating Q matrix (ON Policy).

The previous post example of the grid game showed different results when I implemented SARSA. It also involved some repetitive paths whereas Q didn't show any. A single step showed that SARSA followed the agent path and Q followed an optimal agent path.

To implement both ways I remember the way of pseudo code.


initiate Q matrix.
Loop (Episodes):
   Choose an initial state (s)
   while (goal):
   Choose an action (a) with the maximum Q value
   Determine the next State (s')
   Find total reward -> Immediate Reward + Discounted Reward (Max(Q[s'][a]))
   Update Q matrix
   s <- s'
new episode



Q-Learning with Python

Currently, I am working on learning algorithms in Data Science for robotics. Reading many examples online and trying them on my own gives me a feeling of reward. I got deeply fascinated by Q learning algorithm based on the Bellmans equation. I also made a Pong game using Q learning. You can view that project on my instructable.
It didn' take much time to understand the working of Q learning. It appeared similar to the State Space matrix that I studied in my Control Systems class in college which I have forgotten now. However, seeing a practical application makes it easier to learn.
Q-Learning is based on State-Action-Reward strategy. For example, every state has various actions that can be implemented in that state and we have to choose the action which returns maximum rewards for us.
The agent will roam around like a maniac at the start and learn about its actions and rewards. The next time when the agent faces the similar state, it will know what to do in order to minimiz…

I2C Verilog Code Explanation II

In my previous post, I explained working of I2C Verilog code. Same is continued here.

else if(left_bits == 10)begin
  if(sda == 0)begin
   left_bits <= 1;
   direction <= 1;
   temp <= temp_reserved;
  else begin
   direction <= 1;
   alpha <= 0;
   left_bits <= left_bits + 1;

When the ACK/NACK is received at 9 then at 10 it is compared with 1 and 0. If the acknowledgment received is 0 then left_bits is reset to its initial valule that is 1. Direction is again set to 1 to make the Master ready to send data to Slave. The register TEMP which is now XXXXXXX gets renewed or say reset with a copy that we stored earlier i.e. in TEMP_RESERVED. If the received acknowledgement is 1 then direction will be changed to 1 because now the Master will have to send the address of register which stores data in the Slave. Setting ALPHA = 0 is not necessay here though. LEFT_BITS is again incremented.
else if(left_bits >=11 && left_bits <=17)begin
  alpha <= register…

I2C Verilog Code Explanation I

In this post, I am going to explain my previous post regarding I2C. You can visit the post by clicking here.

INOUT  SDA: The SDA line is the inout port because Master will send data, address along this line as well as the Slave will send ACK/ NACK along the same SDA line hence it has to be inout type.

OUTPUT REG SCL: The SCL line will be the output from Master to other Slaves. SCL is controlled by Master here by the register "a" in the code.

REG DIRECTION: This register will decide whether the direction of flow of data on the SDA line. The line assign sda = direction?alpha:1'bz. using the direction keyword.

Its equivalent code will be
   sda = alpha;
 else if(direction==0)
  sda = 1'bz;

If Master sets the direction as 1 then sda = alpha. At the same moment, Slave must also have the direction set to 0 in order to allow data from Master. When the Slave wants to send the data then the Slave will set the direction as 1 and Master will set it as 0.


ESP8266 WebSockets

I worked on web sockets on a Wemos D1 mini using an ESP8266 chip and it worked fabulously. I have designed an "HTML" file which can initiate a connection with my D1 mini to control the onboard LED.

Any board with ESP will work here. In my application, I can switch on and switch off the onboard LED of the board. A continuous connection was also made to see the real-time calculation going on the ESP board with my browser.

However, I faced some problems like socket timeout which I haven't solved yet. The problem is that when the ESP is connected to the browser, it shows the real-time data for some minutes after which the connection is stopped. I still have to figure out the reason to solve this issue.

WebSockets have helped me to a great extent. I can use the serial monitor over wifi now. I can even update the values over wifi and get back the current readings.

I began with the web socket library by "Ipnica" at GitHub. Click here to download the library. After in…

I2C Verilog Code and working

I had already made a post regarding I2C long ago, however, in this post I am reposting I2C but with various changes. Some changes involve the using of Acknowledgement Bit by the Slave and Master, Same SDA line for slave address, register address as well as data. No extra data line is required to read the data from the slave. Everything can be seen on the SDA line along. This version of I2C in Verilog has the full support of adding *multiple* slaves.
Yes!! You can use multiple slaves at the same time. The only feature lacking that I am working on right now is the RW bit. The RW or better say Read/Write bit is present here but I have focussed only on the read operation here. I am working on the write operation too and will update soon for the latter.

For this I2C I had to grasp myself with the knowledge of the inout signal line in Xilinx. The SDA has to be an inout line or else it won't be a proper I2C model, despite serving the same functionality.

The Master will send 7-bit address…