Interesting Side Effect when Bit Shifting

Last week I received a question from a student in the course I TA that I had no immediate answer for.  This student was playing with basic bit shifting operations in C when they attempted to perform a left shift by 32 bits.  Each shift to the left by one performs a simple set of operations.  The most significant bit (MSB) of the bit sequence is pushed into the Carry Flag (CF) on the processor.  All of the rest of the bits are then moved one place to the left.  The least significant bit (LSB) is then set to 0.  For example, shifting this 4-bit number (0010) left by 1, would result in 0100.  If I were to do this operation again, the result would be 1000.  Once more would make it 0000, since the last remaining 1 bit gets shifted off.

As this question was pertaining to a 32 bit system, the expected result for shifting any bit sequence left by 32 times would be 0, since every 1 bit would have been shifted off the end to the left, leaving nothing but 32 0s in the wake.  The problem the student noticed was that when they used actual values in the statement (-1 << 32), they got the correct result of 0; however, when they used a variable that had the exact same value (-1), and they ran the statement (a << 32), the answer they got back was the original value of a.

For me, this was a very interesting question.  Logically both of these statements should be precisely equivalent.  Shifting anything left by 32, regardless of value, should result in a value of 0.  Since the C code (being a high level language) is abstracted from the processor, I knew I’d need to drop down a level to be able to find out first why these two statements gave dramatically different results, and second, why one of the two was just simply wrong.

To go one level under the hood, I wrote a simple C program and then compiled it, then immediately opened the executable inside of gdb (the GNU Debugger) to examine the assembly code generated.

/* Relevant fragment of my C code */
pftest1() {
return (-1 << 32);
pftest2() {
int a = -1;
return (a << 32);

/* Output of Assembly, with my comments added after the // marks */
080483a4 <pftest1>:
mov $0×0,%eax // === return 0;

080483ae <pftest2>:
sub $0×10,%esp
movl $0xffffffff,0xfffffffc(%ebp) // Moves -1 (current value to shift) into a temp stack space
mov 0xfffffffc(%ebp),%eax // Moves -1 from the temp stack space into EAX (a memory register)
mov $0×20,%ecx // Moves 32 (the number of bits to shift) into ECX (another memory register)
shl %cl,%eax // Shifts the contents of EAX left by the contents of the lower end of ECX
// This last line transliterates to SHL 32, -1.

Even though the two functions in C were logically identical, in the compiled code, the first function (which properly outputted 0 for the student), actually consists of nothing but the equivalent C code for ‘return 0;’.  It literally immediately returns 0.

For the second function, which again was logically identical, we can see that the code in assembly is actually performing the requested operation.  It moves the variable value into a register, then shifts the value of the register 32 times to the left, then returns the value resulting.  This is exactly what we wanted, which is ironic in that it also gives the wrong answer.

So, what happened?  Here’s what I was able to figure out from this test.  For the first statement — the one that used hardcoded values for both sides of the << operator — what happened was that the compiler intercepted the intent of the programmer, which was to wipe all of the bits off the slate, and thusly created the assembly code to simply output 0.  Not only does this achieve the user’s intention, but it also runs really fast.

The second block is more fun in that since there was a variable in the equation, the compiler was unable to determine exactly what would happen at compile time.  Even though the << 32 was hardcoded, when the compiler saw a variable in the equation, it realized it couldn’t predict the possible values of the variable, so it created assembly code to pass the buck off to the processor.  In this case, it creates instructions for the processor to load and perform the shift operation requested.

This is where it gets fun.  In the specification for IA32 processors (http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf), in the description of the SHL opcode, I found this: “The count range is limited to 0 to 31″.  This is something the geniuses who actually design processors came up with to keep up from breaking their stuff.  They won’t allow a shift of more bits than exist in the register.  To keep the data within acceptable ranges, the developers of this particular processor implemented a sort of modulo operation to ensure the value passed into the SAL instruction was within the acceptable limits.

This means that when the student tried to run ‘a << 32;’, what the processor interpreted this as was ‘a << (32%32)’ which is equal to ‘a << 0′.  Hence, the result was the same as doing no shifts at all.   Running ‘a << 33′ would then be the equivalent of running ‘a << 1′.

This is off topic for most of the people that would read this blog, but provides a wonderful insight into the difficulties of optimizing code.  These operations were all perfectly legal, however, in a more robust coding project, this would have been very hard to trace as an error, especially since the logical review of the code would yield a flawless, albeit weird, statement.

Kevin Andrea

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>