GCC: Examining Optimizations of Arithmetic

If you’ve ever looked at the assembly output after compiling a C program with GCC, you may have noticed that sometimes simple arithmetic like

unsigned int a = ...
unsigned int b = a % 3;

becomes something much more involved using shifts, multiplications with wierd constants and subtractions:

movl    %eax, %edx  
movl    $2863311531, %ecx   
imulq   %rcx, %rdx  
shrq    $33, %rdx
leal    (%rdx,%rdx,2), %edx
movl    %eax, %esi  
subl    %edx, %esi

(In the following registers are simply referred to by their 16-bit names such as ax, bx, cx, di etc. Some instructions use the lower 32 bits of the register and some the full 64 bits, which is just technical clutter and ignored for brevity.)

With a power of 2

Let’s first explore these simpler optimizations where the right-hand operand is a power of 2. Modulo by a power of 10 in base 10 is simply ignoring the higher digits. For example 133mod100=33133 \mod 100 = 33 or 13mod10=313 \mod 10 = 3 and so on. The same idea is used here

unsigned int b = a % 4;

becomes a bitwise and with a bitmask of the digits according to the divisor:

movl    %eax, %esi  
andl    $3, %esi

The decimal 3 is a literal whose binary representation is 0000 0000 0000 0011. When this is bitwise anded, the last two bits of the other operand is in the output, which is exactly what the modulo operation does. In general, modulo with 2n2^n becomes bitwise and with the binary representation of the literal 2n12^n-1.

Floor division with a power of 2 is also simple, but slightly different:

unsigned int b = a / 4;

becomes simply a right shift:

movl    %eax, %esi  
shrl    $2, %esi    

This approach is of course the same as division by a power of 10 in decimal. One might expect the same to be the case for multiplication by a power of 2. For instance

unsigned int b = a * 16;

becomes a left shift instead

movl    %eax, %esi  
sall    $4, %esi    

For some other powers of two, something different happens.

Exploiting memory address computation for multiplication

If one modifies the example above to

unsigned int b = a * 4;

The multiplications is actually not performed by a left shift, but using the lea, load effective address, instruction.

leal    0(,%rax,4), %esi

The lea instruction computes the address of the first operand and stores it in the second operand. offset(base register, index register, scale) is an effective address calculation that does the following: base register + offset + index register * scale. This address calculation can be used to array accesses and plays quite well with pointer arithmetic. However, it is not used for computing memory addresses here, but ordinary arithmetic. The above then becomes 0 + ax * 4 which is simply the multiplication by 4 that we want. Even though the scale has to be a power of 2, this can be exploited to multiply numbers not a power of 2:

Multiplication with 3:

leal    (%rax,%rax,2), %esi

ax + ax * 2

Multiplication by 7 is multiplication by 8 followed by a subtraction:

leal    0(,%rax,8), %esi
subl    %eax, %esi  

Multiplication of 11:

leal    (%rax,%rax,4), %edx
leal    (%rax,%rdx,2), %esi

The first line multiplies by 5, and the second line multiplies by 2 and adds 1.

More general floor division

What happens when the divisor is not a power of 2? Lets consider an example of

unsigned int a = ...
unsigned int b = a / 3;

this becomes

movl    %eax, %esi
movl    $2863311531, %eax
imulq   %rax, %rsi
shrq    $33, %rsi

Let’s first examine what the assembly code actually does. To begin with, a is in the ax register.

  1. The contents of ax ( where a is stored) is copied to dx.
  2. A constant whose decimal form is 2863311531 is stored in cx.
  3. cx is multiplied with dx. The result is stored in dx.
  4. The result of the multiplication in dx is shifted 33 places to the right. This is the same as dividing by 2332^33 as we saw earlier.

In mathematical terms b=a2863311531233 b = \frac{a \cdot 2863311531}{2^33} But why is this necessarily a2863311531233=a3\frac{a \cdot 2863311531}{2^33} = \left \lfloor \frac{a}{3} \right \rfloor and where does this magic constant come from? We wish to find the magic constant: a3=a13=ac233c=2333=2863311530+23=2863311531 \left \lfloor \frac{a}{3} \right \rfloor = \left \lfloor a \cdot \frac{1}{3} \right \rfloor = \left \lfloor a \cdot \frac{ c }{ 2^33 } \right \rfloor \implies c = \left \lceil \frac{2^33}{3} \right \rceil = \left \lceil 2863311530 + \frac{2}{3} \right \rceil = 2863311531

Another way of seeing this is, by treating the division as a multiplication: (let’s ignore the integer division truncation for the next bit)

We can turn any division to a corresponding multiplication: a3=a13 \frac{a}{3} = a \cdot \frac{1}{3} It would be much easier to do this computation if the denominator was a power of two (as shown in a previous section). We can express 13\frac{1}{3} as a fraction with a denominator of a power of two by multiplying the enumerator with some appropriate constant such that the denominator is a power of two: =ac2nc=2n3 = a \cdot \frac{c}{2^n} \implies c = \frac{2^n}{3}

We will need cc to be integer, unfortunately cc can’t be. However, we can make an integer approximation. For sufficiently large nn, the approximation c̃c=2n3\tilde{c} \approx c = \frac{2^n}{3} gets very close to 13\frac{1}{3}, as is the case with n=33n=33; 2863311531233=0.33333333337213844...\frac{2863311531}{2^33} = 0.33333333337213844... Since we’re doing integer division in a limited range of integers, a c̃\tilde{c} with sufficiently large nn actually gives correct values for all possible values of aa, as is the case with 2863311531 for division with 3. But why couldn’t we just have used 2323\frac{2^32}{3} or 2313\frac{2^31}{3}? Ie., what makes nn large enough? I still can’t quite figure this out.

Anyway, now that the compiler found an appropriate constant approximation, c̃\tilde{c}, the division can be carried out by first multiplying, then shifting. We can do this because of the associativity of the operators; a(c̃2n)=(ac̃)2na \cdot \left ( \frac{\tilde{c}}{2^n} \right ) = \frac{(a \cdot \tilde{c})}{2^n}. The latter is used, thus the expensive operation of division can be performed using a multiplication and a shift.

More general modulo

Let’s consider modulo in the case where the right-hand operand is not necessarily a power of two.

unsigned int a = ...
unsigned int b = a % 3;

which becomes

movl    %eax, %edx  
movl    $2863311531, %ecx   
imulq   %rcx, %rdx  
shrq    $33, %rdx
leal    (%rdx,%rdx,2), %edx
movl    %eax, %esi  
subl    %edx, %esi

1.-4. Computes a3\left \lfloor \frac{a}{3} \right \rfloor as explained in the previous section.

  1. The next instruction is an optimized way of multiplying dx by (in this case) 3, as explained earlier.
  2. The original value of a is copied to si.
  3. si is subtracted by dx. The result is stored in si.

The result b is in si.

In mathematical terms; b=a3a3 b = a - 3 \cdot \left \lfloor \frac{a}{3} \right \rfloor Where 1.-4. corresponds to the floor division, 5. corresponds to the multiplication by 3 and finally 6.-7. corresponds to the subtraction.

From The Division Algorithm we have that a=dq+ra = dq + r, q=adq = \left \lfloor \frac{a}{d} \right \rfloor and r=amoddr = a \mod d. In our example d=3d=3, r=br=b and still a=aa=a. Having computed qq, we can isolate rr: r=adq r = a - d \cdot q In our example amod3=a3a3 a \mod 3 = a - 3 \cdot \left \lfloor \frac{a}{3} \right \rfloor Which is exactly what happens in the assembly code.