Verilog Code of 16 Bit RISC Processor with working

Verilog Code for the 16 bit RISC Processor 


Hello Everyone, I know many of you out there have been waiting for the working code for this processor along with RTL Schematic. Well, I have successfully coded the single cycle processor with R format Instruction, I format and Branch instructions too. I'll start with my instruction first. It is 16 bit in length.

16'bxxxx_yyyy_zzzz_qqqq

xxxx indicates the OPCODE which decides the operation which has to be carried out.
yyyy indicates the location of Register 1.
zzzz indicates the location of Register 2.
qqqq indicates the location of Register where data has to be written. It also helps to determine the number of instructions the user wants to jump.

The code I have devised takes 5 clock cycles to execute a single instruction. For BRANCH format, I stall the processor for 1 clock cycle. So, to make it clear, we have 5 stages here,

1. Instruction Fetch Stage
2. Instruction Decode Stage
3. Arithmetic Stage
4. Memory Stage
5. Write Back Stage

In the first stage, we will extract the instruction from the Instruction memory which contains the opcode, and register addresses. In the Instruction Decode, The register file will receive the register addresses and well then extract the data from each register which has to be sent to the ALU. In the Arithmetic stage, the ALU will receive the opcode from the control unit and will then perform the Arithmetic operation according to the opcode. After the Arithmetic stage comes to the Memory stage.
It depends on two signals. One is the read and the other one is the write. If read signal is activated it will then read the value stored in the address received from the Arithmetic stage and will output the same. In the write stage, it will write the Register B data in the address received from the Arithmetic stage. Finally comes the Write Back stage. This will write the data in the write register in the register file.

Don't consider these stages as pipelines as for now. Pipelining is easy and will upload in coming days.

Here the datapath of my version of the single cycle processor.


RTL Schematic Lovers!! Here you go:


Do remember one thing. Ideally, one shouldn't have any output other than display. Our data path has output as many "Internet Guys" have asked me to do so without learning how to do so. 


Program Counter:

module PC(in,pc_select,clk,out);
input [7:0]in;
input clk,pc_select;
output reg [7:0]out;
initial begin
out = 0;
end
reg [8:0]temp;
always @(posedge pc_select)begin
temp = in + 2;
out = temp[7:0];
end
endmodule

Program Counter is used to send out instruction location to the Instruction Memory(IM) from where IM will select out the instruction to work upon. The logic is to increment the counter by 2 units at every posedge of pc_select signal whenever it goes to high. This signal is begin driven by the control unit. I have used temp as a temporary variable to store carry if bit length gets exceeded.  Now the question arises, why 2 units and not 1. Follow the below example.

Let us take a16-Bit instruction as 1010_0000_1111_0011. Now One location is of 1 byte. This will store 1010_0000 and the other location is of another 1 byte will store 1111_0011 i.e. 8 bits. Now in the instruction memory, the instruction is stored as

Imem[1] = 7'b1010_0000
Imem[2] = 7'b1111_0011

The instruction which is of 16 bits will consist of two 8 bits i.e. 2 bytes. This will execute out of the instruction memory as {Imem[1], Imem[2]} which is 16'b1010_0000_1111_0011. Now the next location where the second instruction will start is from Imem[3] so the 2nd instruction will be {Imem[3], Imem[4]}. So a careful observation will tell that every new instruction begins at 1, 3, 5, 7, 8......

Hence, 
Imem[in] = Imem[1]
Imem[in + 2] = Imem[3]

I hope, everything is clear now.

Instruction Memory

module INSTRUCTION_MEMORY(address,clk,opcode,A_reg,B_reg,W_reg,Sign);
input [7:0]address;
input clk;
reg [3:0]dest;
output reg [3:0]opcode;
output reg [3:0]A_reg;
output reg [3:0]B_reg;
output reg [3:0]W_reg;
output reg [3:0]Sign;
reg [7:0] imem[0:17];
reg [15:0] instruction; 
initial begin
imem[0]<=8'b0001_0011; 
imem[1]<=8'b0111_0010;

imem[2]<=8'b0010_0010;

imem[3]<=8'b0001_0011; 

imem[4]<=8'b0011_0100;

imem[5]<=8'b0010_0011;

imem[6]<=8'b0100_0000;

imem[7]<=8'b0001_0010;

imem[8]<=8'b0101_0111;

imem[9]<=8'b0010_0010;

imem[10]<=8'b0110_0010;

imem[11]<=8'b0001_0010;

imem[12]<=8'b0111_0001;

imem[13]<=8'b0001_0011;

imem[14]<=8'b1000_0110;

imem[15]<=8'b0001_0011;

imem[16]<=8'b1001_0001;

imem[17]<=8'b0011_0001; 


end


always @(negedge clk)begin

instruction = {imem[address],imem[address+1]};
opcode = instruction[15:12];
A_reg = instruction[11:8];
B_reg = instruction[7:4];
W_reg = instruction[3:0];
Sign = instruction[3:0];
end

endmodule


Instruction memory will output the opcode to Control Unit, Register 1 address, Register 2 address, Write register address and Branch index. Opcode is of 4 bits, same for Register 1 and 2 and for Write register and Branch index.

Register File

module REGISTER_FILE(clk,readA,readB,dest,data,reg_wrt,readA_out,readB_out);
input reg_wrt;
input [3:0]readA,readB,dest;
input [7:0]data;
input clk;
reg [7:0] Register [0:15];
initial begin
Register[0]=0;//R0 alwayscontains zero
Register[1]=2;  //Random values stored
Register[2]=4;
Register[3]=6;
Register[4]=8;
Register[5]=10; // You can change any value within this initial block
Register[6]=12;
Register[7]=14;
end
output reg [7:0]readA_out,readB_out;
always @(negedge clk)begin
readA_out <= Register[readA];
readB_out <= Register[readB];
if(reg_wrt==1)
Register[dest]<=data;
end

endmodule





The register file will get the input from Instruction Memory with register address to read from and write register to write data into. Whenever the reg_wrt flag is high (from Control Unit), the incoming data will get stored in Write register. In the initial begin section, I have stored manually some values. Here "readA" is the address of register received from Instruction Memory. A similar condition is for readB.

Arithmetic Logical Unit.

ALU receives the opcode from ALU which will dictate the ALU which operation it has to perform. Currently, I have included add, subtract, increment, decrement, logical operations along with Branch instructions. For every code, we will store the carry bit and zero bit which will be used to carry. For BEQ function, if Reg_1 is equal to Reg_2 then z will turn to 1. The opposite is the case for BNE function. For BLT(Branch if Less Than) if Reg_1 is less than Reg_2, carry bit go high. The opposite is the case for BGT (Branch if greater than). I'll remove some functions from the ISA like logical OR, NAND to put ADDI AND SUBI functions later.



Data Memory

Nothing special about this module. It has two signal, re for reading and we for write. When "re", it will read from a location received from ALU. When the signal is "wr", it will write the data received from Reg_2 from Register file to the address received from ALU.



Mux_1, Mux_2, and Mux_3 decide the flow of data. Mux_1 decides the write register between R format and I format. Mux_2 decides the data that has to be forwarded to the ALU between load, store functions, and R format functions. Mux_3 will decide whether it has to catch the output from Data Memory of will just carry forward the result from ALU.

Mux_4, Mux_5, Mux_6, Mux_7 decide the branch function working. Each receives two addresses, first one is the normal execution of PC to get the new address which the second port receives data from adder which is the location of the new jump address. Mux_4 is for BEQ, Mux_5 is for BNE. Mux_6 is for BLT and Mux7 is for BGT. The Mux_8 decides which data from the Muxes will move ahead to the PC.

Why Sign Extend and Left Shift?

Coming ahead to Sign Extend and shift left 1. Whenever a user gives an instruction of BRANCH, he/ she specifies the new jump address i.e. he/ she does not specify the IM address. It tells us how many instructions to jump. Let us take some instruction as specified below

imem[0]<=8'b0100_0011;
imem[1]<=8'b0111_0010;

imem[2]<=8'b0101_0010;
imem[3]<=8'b0001_0011;

imem[4]<=8'b0011_0100;
imem[5]<=8'b0010_0011;

imem[6]<=8'b0100_0000;
imem[7]<=8'b0001_0010;

imem[8]<=8'b0101_0111;
imem[9]<=8'b0010_0010;

imem[10]<=8'b0110_0010;
imem[11]<=8'b0001_0010;

imem[12]<=8'b0111_0001;
imem[13]<=8'b0001_0011;

Now the user wants to jump from imem[0] by 2 locations. He/ She will put out an instruction:
BRANCH $r1 $r2 2.
Don't confuse with PC code when I had incremented with 2 units. The user doesn't know anything about inside architecture. If He/ She wants to jump by 4 locations it will be as follows 0 --> 2 --> 4 --> 6 --> 8

Now the input to sign extend will be 0010 from imem[1] and 0010 is instruction[3:0]. Sign extension means to replicate the MSB to the bit position ahead of MSB. Hence, extended data will be 00000010. Shifting this left by 1 bit will give us 00000100. Now add this to "current" address which is imem[0] i.e. 0.The result will be 8'b00000000 + 8'b00000100 = 8'b00000100 which is 4. This 4 will be fed to PC and this his new instruction will be from imem[4], thus it successfully, jumped by 2 locations.

Do the same for imem[4]. instruction[3:0] shows that the user wants to jump by 3 instruction. Forget the opcode for now as this is intended just for an example. Here sign extended bit will be 00000011 and with shift left, it will be 00000110. Adding this to "current" location i.e. 4 which will give us 00000100 + 00000110 = 00001010 which is 10. Thus, our new instruction will be at imem[10].

For a 32-bit processor, we will have to shift by 2 bits. Still confused, how we are jumping? Comment and I'll solve your doubt. 


Control Unit

The Control Unit is what I would call as the heart of this processor. It is his responsibility to switch Mux and signals at the right stages in order to give the correct output. I have divided the stages as discussed above in 5 states.

1. Instruction Fetch---- s1
2. Instruction Decode--s2
3. Arithmetic Stage----s3
4. Memory Stage ------s4
5. Write Back Stage----s5

While working on the Control Unit, one must remember that, although every module has been provided with a clock, it does not mean, they will execute simultaneously with thorough data. For example, IF stage will take 1 CC to move data from input to output. ID stage will only be able to work on input when the IF stage sends the data on its output. Thus the 1st CC for ID is wasted and at the second CC, it works on the instruction received and itself takes 1 CC to produce its own output. In my code, s0 state is the initial state, means rest stage. From s0, it will move to s1 and then will keep rotating to s5 and back to s1.
Now in s0 stage, we will decide the signals required to prepare for stage s1. Similarly, when in stage s1, we will switch those signals which will be required for stage s2 and so on. At the end stage, s5, we will change signals which will be required for stage s1 / IF stage. So for the IF stage, we require pc_select signal to be set as 1, so a new instruction location can be sent to the IM. Note that you cannot switch the signal, when you are already in the state i.e. if the processor is in state s0, it cannot set pc_select at that state because it will get activated in the next clock cycle. So after the next CC, our processor will be in ID state with pc_select set as 1 which is useless or basically say a big error. 

Why? 

It is because pc_select will signal IM(Instruction Memory) to release a new instruction while the previous instruction is still in the ID(Instruction Decode) stage. At this moment, the destination register will change to the register address included in the new instruction. This will cause an error as we wanted to write our previous instruction's data to different register but now, it will get written to a different one. Hence, one can also conclude by this that, one has to hold the pc_select i.e. new instruction fetch until the longest instruction cycle has completed all of its states/ stages.  

For a better understanding, have a look at the below image to understand.


Black lines indicate processing of instruction A and green lines indicate processing of instruction B. One must be wondering that the WB stage of A instruction and IF of B instruction is happening at the same time. Well in that CC(clock cycle) we are fetching instruction for B. Until that CC completes, the output will not be available for the register file. While the previous instruction is ready to write at the write register. Overriding the write register address will take place at the next CC. i.e. the ID stage i.e. the 2nd CC for B signal. The IM has sent the address to Register File but the Register File will override the incoming register addresses until next CC. So at 6th CC, this will take place while the WB for A instruction will take place at the 5th CC.

For an overview explanation of the control unit again, all one has to remember is to always put out those signals which you want to activate for the next state. For example, state s0 will contain the switch of signals which will be required for s1 state. Similarly, s1 will contain signals that are required for s2. s2 state will contain signals which are required for s3. Similarly, s3 will control signals for s4 and s4 will control signals for s5 and s5 will control back for s1. Hence, when the processor is in the WB state, it will contain this piece of code.

pc_select <= 1;

which means that as soon as WB state (s5) completes its execution, pull the pc_select line high so the in the next state s1, thus s1 will have all the required components to work. Suppose you are hungry, so won't you want your mother to keep food ready at home as soon as you arrive there or will you choose to cook after reaching home. Similarly, each state will prepare food (switch appropriate signals) for the next state so as soon as the next state (you) arrive, you get your food (data flowing in switched signals).

Please comment, if you want a detailed explanation of Control Unit and its code.

The ISA of this processor consists of 15 instructions.


BRANCH Instruction working.

Have a look at those complex multiplexors in the datapath diagram at the top of the post. Well, it depends on the programmer, how he/ she decides to wire his circuits to execute the instructions. Now we have 4 Multiplexors. Mux4 will work for BEQ instruction which is Branch if Equal means if $r1 == $r2 then Branching will start. Mux5 will work for BNE instruction which is Branch if Not Equal means if $r1 is not equal to $r2 then Branching will start. Mux6 will work for BLT instruction which is Branch Less Than means if $r1 is less than $r2 Branching will start. Similarly, Mux7 is for BGT which is Branch Greater Than means if $r1 > $r2 Branching will start. 
  The select line for Mux1 (BEQ) is controlled by the output from AND gate. The inputs of AND (for BEQ) gate are Z flag from ALU and another from Control Unit. So if we want BEQ instruction, the select line from Control Unit is pulled HIGH. Now, there are two cases here. Firstly, if $r1 == $r2 then Z will go HIGH. Now the input for AND gate is 1 and 1, hence the output will be 1. Meanwhile, all other selects lines for Mux5, Mux6, Mux7 will remain low so the AND gate output will be 0 for those. When we concatenate AND gate outputs for each Mux signal, we will get 1000 for BEQ. Similarly, for BNE, we will get 0100. For BLT it is 0010 and finally, for BGT it is 0001. The select line for Mux8 i.e. sel8 is of 4-bit length. This select line is concatenated values of the select lines of each AND gate which controls the Muxes. So sel8 <= {sel4,sel5,sel6,sel7}.
Thus the controls for Mux8 will be

1000: in1 ------from Mux4
0100: in2 -------from Mux5
0010: in3 -------from Mux6
0001: in4 -------from Mux7
0000: out ---- from PC. Natural new instruction flow
else: b_out ------- New jump address without dependence on any Flags

For BRN, we need to pull down all select lines to Muxes via AND gate. Thus the output of AND gates will be low which will output 0. Its concatenation will be 0000. Suppose we want BEQ instruction but $r1 != $r2. This will put the Z flag to 0. For BEQ only select line for AND4 will be high to select Mux4. So AND of 0(Z) and 1 from Control Unit will be 0. Thus our output for sel8 will be again 0000 which means, carry out normal flow of instruction in sequence and do not jump. To branch directly all we have to do is pull all select lines high which go to then AND. This will set O/P to 0 thus all bits will be 0000 irrespective of flags.
                So the question here is, how will pulling all lines set the Mux8 to take the b_out (branch_out) instruction as an input. Suppose Z and Carry are equal to 0. Thus ~Z and ~Carry will 1. These signals go to each of the gates whose another input is already high by our Control Unit(Control Unit). So the O/P will be as follows for sel8.


AND4 inputs are Z and from CU
AND5 inputs are ~Z and from CU
AND6 inputs are Carry and from CU
AND7 inputs are ~Carry and from CU

Z = 0 Carry = 0 Control Units pulls all signal to 1
AND4: 0 & 1 = 0
AND5: 1 & 1 = 1
AND6: 0 & 1 = 0
AND7: 1 & 1 = 1
Concatenate result: 0101

Z = 0 Carry = 1 Control Units pulls all signal to 1
AND4: 0 & 1 = 0
AND5: 1 & 1 = 1
AND6: 1 & 1 = 1
AND7: 0 & 1 = 0
Concatenate result: 0110

Z = 1 Carry = 0 Control Units pulls all signal to 1
AND4: 1 & 1 = 1
AND5: 0 & 1 = 0
AND6: 0 & 1 = 0
AND7: 1 & 1 = 1
Concatenate result: 1001

Z = 1 Carry = 1 Control Units pulls all signal to 1
AND4: 1 & 1 = 1
AND5: 0 & 1 = 0
AND6: 1 & 1 = 1
AND7: 0 & 1 = 1
Concatenate result: 1010

Thus whenever the select line sel8 will have two 1's then it will jump to the new branch address (b_out).

Confused about something?
Comment!!


To get the code:

1. Disable your Pop-Ups
Upon clicking on the below link, there will be 4 pop up tabs which contain code.
Click here for Verilog Code

30 comments:

  1. Bro , while getting the code it shows my gmail account does not have access to view this page .

    ReplyDelete
    Replies
    1. Working on this fix. Having scripting issues

      Delete
    2. Issue Fixed
      4 linked pages will open/pop

      Delete
  2. DataPath code is missing. Will upload it today evening

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Datapath is itself top level module connecting all modules
      Regarding Muxes, it is meant to be repeated. The reason it is separate with same code is to avoid confusion. Instantiation will be from Mux4 to with different instance name.
      Look at the datapath code to understand

      Delete
  4. you didnt use pipelining concept here , will you upload pipelined code ? well thank u so much for sharing this !

    ReplyDelete
    Replies
    1. Thanks
      Yes I am working on pipelined processor 32 bit

      Delete
    2. Would it be difficult to use pipelining in this 16 bit risc .

      Delete
  5. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. Any specific reasons why 16 bit is so important rather than 32bit?

      Delete
    2. This comment has been removed by the author.

      Delete
    3. Have you inplemented the above code?
      Is it working?

      Delete
    4. This comment has been removed by the author.

      Delete
    5. This comment has been removed by the author.

      Delete
  6. Thanks so much for the code. its well built and properly described. Could you please upload the pipelined version of 16 bit risc processor. It would be extremely helpful. Thanks once again

    ReplyDelete
  7. Sir, please in a new post tell us how shall we implement pipelining in the above code.

    ReplyDelete
  8. Bro, any estimated date when you will upload pipelining concept or code.

    ReplyDelete
  9. Bro, aapne training kha se ki thi
    nd kuch notes ya best resources k baare m bta skte ho

    ReplyDelete
  10. where this code is implemented ?
    On DE10 Lite FPGA Board??

    ReplyDelete
    Replies
    1. Worked On Tiny FPGA, Basys2

      Delete