classic loop optimizations

Loops optimizations are important because

  1. typically there is a regular access pattern
  2. the body of a loop gets repeated
  3. compilers often assume \(10^{depth}\) times

What are classic loop optimizations?

  1. Loop Invariant Code Motion
  2. Induction Variable Recognition
  3. Strength Reduction
  4. Linear Test Replacement
  5. Loop Unrolling

Less classic loop optimizations

  1. Scalar replacement
  2. Loop Interchange
  3. Loop Fusion
  4. Loop Distribution (also known as Fision
  5. Loop Skewing
  6. Loop Reversal

First recall natural loops

  1. strongly connected region in the cfg
  2. one entry point (dominates all the nodes in the loop)

def of loop invariant for an instruction d = op a,b

  1. a,b are constants or,
  2. a,b defined outside the loop
  3. a,b are loop invariants

in SSA form if we find a loop invariant instruction we can always move it into the pre-header, because the value it writes is never rewritten, and the values that it depends on come from outside the loop

conditions when moving an instruction d = a op b is ok

L0: d = 0 
preheader: 
L1: i = i + 1 
d = a ⊕ b 
     = d 
if (i<N) goto L1 
L2: x = d 

can move d

L0: d = 0 preheader L1: if (i>=N) goto L2 i = i + 1 d = a ⊕ b = d goto L1 L2: x = d ```

no good d used after the loop, would not be changed if the loop executes zero times

L0: d = 0 
preheader 
L1: i = i + 1 
d = a ⊕ b 
  = d 
d = 0 
  = d 
if (i<N) goto L1
L2: L0: d = 0 

no good d reassigned in the loop, do invar would be changed

l0: d = 0 
preheader 
L1: = d 
i = i + 1 
d = a ⊕ b 
   = d 
if (i<N) goto L1 
L2: x = d


conditions without SSA

1) the instruction dominates all the loop exits, where d is still live 
1) d is only defined once 
1) d in not live before the instruction 

in SSA

1) is d is live in some block after the loop, then d has to dominate that block
2) clear 
3) clear 


Suppose the loop might run zero times 

while (e) { j = loopinv // may never execute S }

j = loopinv // always executes while (e) { S }


can be converted into 

if (e) { j = loopinv // may never execute while (e) { S }

} ````

induction variable elimination

for (int i = 0; i < 100; ++1){
    f(a[i])
}

calculate a[i] as: &a[0] + 4 * i in every loop iteration, but the values at each step only differ by 4

  1. a_i = &a[0] before the loop
  2. a_i = a_i + 4 (add the stride) in every iteration
  3. the only remaining use of i is the test i < 100, which could become a_i < &a[0] + 4*100 (which is loop invariant)

steps

1find basic induction variables i = i + e, where e is loop invariant

what does this look like in ssa

loop header:
 i1 = phi(i0, i2)
loop body:
i2 = i1 + e
loop header:
 i1 = phi(i0, i2)
loop body:
a0 = i1 + e
i2 = a0 + e1

for each instruction d = c +- loop invariant see if there is a strongly connected graph in the ssa edges that only has adds and subtracts of loop invariant expressions

Step 2 find auxiliary induction variables

j = basic_ind * loop inv + loop invar

for (int i = 0; i < n; i++) {
     j = 2*i + 1;     // Y 
     k = -i;          // Y 
     l = 2*i*i + 1;   // N 
     c = c + 5;       // Y* 
}

step 3 replace auxiliary induction variables (derived ) by new variables without the multiply

step4 if the only remaining use of the induction variable is the termination test, change the test to use the new variable

sum = 0
for (i = 1, i < 100; i++) {
  sum = sum + a[i -1]
}

in SSA form:

   sum0 = 0
   i0 = 1
L: sum1 = phi(sum0, sum2)
   i1 = phi(i0, i2)
   t10 = i1 -1 
   t20 = t10 * 4
   t30 = t20 + &a
   t40 = load t30
   sum2 = sum1 + t40
   i2 = i1 + 1
   if (i2 <= 100)go to l
  1. i is a basic induction variable
  2. t10 is a aux induction variable
  3. t20 is an aux induction variable
  4. t30 is an aux induction variable

t3 has a use in the load

t3 = t20 + &a ==> t10 * 4 + &a ==> (i1-1)* 4+ &a

t3 = 4* i1 + &a - 4

   sum0 = 0
   i0 = 1
   t50 = &a -4  // initial value 
L: sum1 = phi(sum0, sum2)
   i1 = phi(i0, i2)
   t51 = phi(t50, t52)
   //t10 = i1 -1 
   //t20 = t10 * 4
   //t30 = t20 + &a
   t40 = load t50
   sum2 = sum1 + t40
   i2 = i1 + 1
   t52 = t50 + 4
   if (i2 <= 100)go to l
   sum0 = 0
   i0 = 1
   t50 = &a -4  // initial value 
L: sum1 = phi(sum0, sum2)
   // i1 = phi(i0, i2)
   t51 = phi(t50, t52)
   //t10 = i1 -1 
   //t20 = t10 * 4
   //t30 = t20 + &a
   t40 = load t50
   sum2 = sum1 + t40
   //i2 = i1 + 1
   t52 = t50 + 4
   if (t52 <= 396 + &a )go to l

loop un-switching

for (int i = 0 ; i < 100; ++1){
    if (c) {  // c is loop invariant 
        f(i)
    } else {
        g(i)
    }
}

look for special patterns and replace

if (c) {  // c is loop invariant 
   for (int i = 0 ; i < 100; ++1){
        f(i)
    } 
}else {
    for (int i = 0 ; i < 100; ++1){
        g(i)
    }
}

This is often done before vectorization

loop fusion

for (i = 0; i < 100 ; ++){
 s0:   b[i] = f(a[i])
}
for (i = 0; i < 100 ; ++){
 s1:   c[i] = f(b[i])
}
  1. when is it legal to do this?
  2. When can we get rid of the b array?

There is also an optimization that goes the other way split a loop so that each statement becomes a separate loop incase we could run as vectors

These sort of loop optimizations would make good projects

Back to top