My Mac has died. I’m waiting for my backup to copy to my desktop before trying anything funky, which coincidentally gives me enough time to start right into MIPS.
MIPS is the assembler language used on RISC architecture machines. As with any assembly, the precise set of instructions and operations changes from processor to processor. For consistency (and sanity), I’m going to base this series of articles on the R3000 processor. This processor is a 32-bit system that offers us a very nice feature for our use in a learning environment: we have a simulator. SPIM (http://spimsimulator.sourceforge.net/) is a robust simulator for the R3000 RISC processor, on which you can run your MIPS assembly code.
There are many modern versions of SPIM with plenty of GUI to go around, but I’m still a fan of the console-based simulator. All of the code I’m going to write will be assembled and executed using SPIM ver. 8.0 for Linux, on the command line. I’m also going to limit this first series to non-floating point calculations for simplicity. I’ll go back and address FP operations later. Now that the administrative stuff is out of the way, it’s time to dive right into the fun!
To begin with, our processor features 32 registers for use in our programming. While several are reserved for special purposes, the majority of these registers can be used for general purpose computing.
NAME NUM USE $zero 0 Easy Access to a Zero value. $at 1 (Reserved for OS Use) Assembler Temporary $v0-$v1 2-3 Evaluation Results $a0-$a3 4-7 Used for Arguments $t0-$t7 8-15 Used for Temporary Storage (Callee Saved) $s0-$s7 16-23 Used for Temporary Storage (Caller Saved) $t8-$t9 24-25 Used for Temporary Storage (Callee Saved) $k0-$k1 26-27 (Reserved for OS Use) $gp 28 (Reserved for Stack Use) Global Pointer $sp 29 (Reserved for Stack Use) Stack Pointer $fp 30 Frame Pointer $ra 31 Return Address
These are the registers we will be using for our temporary storage throughout our programs. While I’ll get into procedures in a future post, I do need to address one aspect of those with regards to register usage. In the above table, I have several registers marked as Callee Saved and several marked as Caller Saved. These are essential, albeit non-enforced, constraints on your coding that will determine how you use these registers. Callee Save is a concept, which states that the value in those registers may freely be modified by any subsequent procedure calls. Since these values will be modified, if the current procedure wishes to keep those values for use following a procedure call, they must first be saved. Following the procedure call, they may then be restored to their original values. The complement to this, Caller Save, states that the procedure using them will be guaranteed that these will not be changed by any subsequent calls, and therefore has no need to back them up. On the other hand, if you want to use any Caller Save registers in your procedure, you must first back them up and then restore them at the end of your procedure, to fulfill your guarantee to the procedure who called you, that you wouldn’t mess with those registers. I’ll cover these in more details when I get into procedure calls though.
RISC assemblers distinguish themselves by offering a very small, yet robust set of processor operations. Moreover, each of these instructions on our machine uses the same fixed-size for every instruction: 32-bits. This lies in stark contrast to the x86-based processors, which feature many more instructions, each of which varies greatly in size from one operation to the next. For this purpose, you will occasionally see no-op (NOP) instructions in x86 code for purposes of alignment. Within MIPS, because each instruction is precisely the same width, you can make a few abstractions and come up with a couple of common formats for packing essential instruction data.
RISC uses only three different instruction formats for all of its non-floating point operations. These are known as the R (arithmetic-based), I (immediate-based), and J (jump-based) formats. I’m going to cover these formats for reference, however, these formats only equate out to the machine language codes that the RISC processor uses. The MIPS assembly instructions to generate these RISC instructions follow this section.
This format is used for all operations which do not use immediate values (values which are stored inside of the instruction directly) or direct jumps. The following shows the encoding format for these types of instructions:
BITS: 6 5 5 5 5 6 OPCODE RS RT RD SHAMT FUNCT
In this format, the OPCODE is not a direct correlation with the mnemonic code for the operation to execute. This is a code that is used to provide control instructions to, among other things, the ALU. All of the R type instructions use an OPCODE of 000000, which enables the ALU for processing. RS is a 5-bit code that specifies which of the 32 registers to use as the source for the arithmetic operation. RT is the second 5-bit code to specify which of the 32 registers will be used as the second source for the arithmetic operation. RD is the third 5-bit code that is used to specify which of the 32 registers will be used to store the output of the ALU. Notice here that there are no addresses mentioned in these instructions. The R format is a format that only works with registers.
SHAMT is a field that is used to determine the amount to shift by for shifting operations. FUNCT is only used by R type instructions to specify which operation will be executed by the ALU. This is the key block of 6-bits in the R instruction to specify the operation to perform.
Common FUNCT Codes Hex Operation 0 Shift Left Logical (Using SHAMT) 2 Shift Right Logical (Using SHAMT) 3 Shift Right Arithmetic (Using SHAMT) 4 Shift Left Logical (Using Reg) 6 Shift Right Logical (Using Reg) 7 Shift Right Arithmetic (Using Reg) 8 Jump (Using Reg) 9 Jump and Link -- Adds next instruction to $ra --(Using Reg) 20 Add 21 Add Unsigned 22 Subtract 23 Subtract Unsigned 24 And 25 Or 26 XOR 27 NOR 2a Set Less Than 2b Set Less Than Unsigned
This format is used for all operations which use immediate values. The following shows the encoding format for these types of instructions:
BITS: 6 5 5 16 OPCODE RS RT IMMEDIATE
In this format, the OPCODE is not a direct correlation with the mnemonic code for the operation to execute. Unlike with the R format, this code contains no destination, SHAMT, or function code. These fields were superfluous as the data could be packed into the OPCODE, RS, and RT. Notice that the OPCODE, RS, and RT are in the same positions and have the same size as in the R format instruction. This is a feature of RISC that allows for very rapid processing of instructions by breaking them up into the blocks and working with them, even before the OPCODE is deciphered.
For these instructions, RT serves as the destination, RS serves as the first source, and IMMEDIATE serves as the second source. Notice that whereas our native size on a 32-bit system is a 32-bit word, our immediate value here is only 16 bits. This means your immediate values can only contain 16 bits of actual data. The processor immediately runs this IMMEDIATE value through an extender to get it up to 32-bits, so it can be used in processing. Whether it is sign-extended or zero-extended depends on the OPCODE.
Common OPCODEs for I Instructions Hex Operation 4 Branch on Equal 5 Branch on Not Equal 8 Add Immediate 9 Add Immediate Unsigned a Set Less Than Immediate b Set Less Than Immediate Unsigned c And Immediate d Or Immediate e XOR Immediate f Load Upper Immediate 20 Load Byte 23 Load Word 24 Load Byte Unsigned 25 Load Halfword 28 Store Byte 29 Store Halfword 2b Store Word
This format is used for all non-branch jumps:
6 26 OPCODE IMMEDIATE
Notice that the OPCODE is the same size and in the same position as the above two instruction formats. This enables the RISC processor to strip out the OPCODE and begin processing it immediately, regardless of which type of format the instruction is in. Also of note here is that the IMMEDIATE value is now up to 26-bits in size. It is still not sufficient for native processing, but it does hold more values than an I format instruction can.
There are only 2 J format instructions. These both use the OPCODE hex value of 2. The first performs an unconditional jump to the given jump address. The address is determined by taking the upper 4 bits of the program counter, the entire immediate, and then padding the lower order bits with two zeroes. This forms a 32-bit address which is used for jumping.
Now that I’ve covered the instructions that RISC receives, let me begin to cover the actual MIPS instructions for programming. I’m going to wrap this entry up pretty quickly, so I’m just going to cover enough to code up a simple “Hello World”.
In order to do this, however, we will need to look at two more concepts: Segments and Interrupts.
A program on your computer is nothing more than a simple binary file that contains large amounts of different types of data. Your compiled source code is merely one element of this data. It joins things such as your symbol table, hardcoded strings, and other initialized data that is necessary to operate your program. The operating system executes the program by extracting the information from its various data sections. These sections are called segments and are used to partition your program file up into logical arrangements. For this article, I’m going to cover your two most important sections for MIPS programming: .text and .data
This section contains all of your source code for your program. In here, you’ll intermix MIPS instructions and processing directives to instruct the assembler how to link the code into the final file.
This section contains all of your hardcoded data. This is typically used to store your strings, however, it is also used to set aside memory for complex data types, such as arrays and structs.
For our Hello World, we will need to use both of these sections. Fortunately, in MIPS, switching sections is as easy as using those two commands: .text and .data. These can be used throughout the code and in as many places as needed to switch back and forth between them. When the program is assembled, all of the data segment sections will be combined together to form a combined data segment.
The last thing we’ll need to know is the Interrupt table for the system. This table contains all of the system calls that are made to perform system level operations. In our case here, we want to use the system to write a message on to the screen. The following table represents the common system calls:
Operation $v0 Arguments Result Print an Int 1 $a0 - Int to Print Prints $a0 to the screen. Print a String 4 $a0 - Srring to Print Prints $a0 to the screen. Read an Int 5 Reads an Int from the Keyboard into $v0 Read a String 8 $a0 - Buffer, $a1 - Length Exit 10
Now, we have all of the knowledge needed to produce a program with only one exception. We don’t know any MIPS instructions yet. Fortunately, for the purposes of this Hello World, we don’t need much.
MIPS Instructions typically use a tri-element format for instructions. Arithmetic instructions are generally in the format of: INSTR DEST, SRC1, SRC2 and will perform the operation using SRC1 and SRC2 as inputs and DEST as outputs. For memory functions dealing with an immediate, this is condensed into a two argument format: INSTR DEST, SRC. This is the format that we are going to use for our first program.
Looking at the system call table, we see that we want to print a string. So, we need to load the value 4 into the register %v0, and then we need to load the address of the string into $a0. For this, we need to learn our first two MIPS instructions: li and la
li (Load Immediate)
Usage: To load an immediate value into a register.
Format: li $reg, imm
Example: li $t0, 1
Outcome: Loads the value of 1 into the register t0.
la (Load Address)
Usage: To load an address into a register.
Format: la $reg, address
Example: la $t0, g_string1
Outcome: Loads the address of the string labeled g_string1 from the data segment into register t0.
syscall (Call the System Interrupt)
Usage: To make a System Call
Outcome: Calls the System Interrupt. The System will process the command stored in $v0 and use the other arguments as specified.
So, let’s take all of this and put it together into a program. All strings have to be placed into the data segment. We do this by using the .data keyword to switch to the data segment and then allocating space to store the string into the segment. This is done using a processing directive (which I’ll cover in the next section) called .asciiz which takes the given ASCII string and places it into the data segment, using the specified label to refer to the address. The ‘z’ in asciiz is crucial here as it refers to zero-termination. This is the null terminator that specifies the end of strings.
Following this, we’ll switch back to the .text segment, which is where our source code lies. In this segment, I’m going to load 4 into $v0 to specify that I want to output a string onto the screen, then I’m going to load the address (using the label of the string) into $a0. Once I have this loaded, I’ll make a system call. After this, I’m going to load 10 into $v0 to specify that I want the program to exit, followed by another system call.
.data g_s1: .asciiz "Howdy World!\n" .text main: li $v0,4 la $a0,g_s1 syscall li $v0,10 syscall
And the output:
kandrea@zeus:~$ spim -f hw.mips SPIM Version 8.0 of January 8, 2010 Copyright 1990-2010, James R. Larus. All Rights Reserved. See the file README for a full copyright notice. Loaded: /usr/local/lib/spim/exceptions.s Howdy World!
Assembly programming relies on a tremendous amount of background knowledge, however, now that most of this is out of the way, future posts will dive straight into the commands and get some good code going early.
- Kevin Andrea