classic loop optimizations

Loop optimizations are important because

typically there is a regular access pattern
the body of a loop gets repeated
compilers often assume \(10^{depth}\) times

What are classic loop optimizations?

Loop Invariant Code Motion
Induction Variable Recognition
Strength Reduction
Linear Test Replacement
Loop Unrolling

Loop Invariant Code Motion

recall natural loops

strongly connected region in the cfg
one entry point (dominates all the nodes in the loop)

pre steps - reshape the cfg

find the natural loops
add pre-header

if we are going to move code we often need to add a special basic block which is called a landing pad or a pre-header create a new block b. change all the preds of the loop header to point to the pre-header, add an edge from b to the loop header

pre-header can change phi nodes

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TB
A["x1 = 3"]
B1["y1 = 4"]
B2["y2 = 5"]
C["x2=phi(x1,x3)\ny3=phi(y1,y2,y3)\nz1=x2*x3\nq1=y3*y3\nw1=y3+2"]
D["w2=w1+5"]
E["w3=phi(w1,w2)\np1=w3+y3\nx3=x2+1\nq2=q1+1"]
A--> B1
A--> B2
B1--> C
B2--> C
C-->D
D--> E
C--> E
E--> C
E--> After

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TB
A["x1 = 3"]
B1["y1 = 4"]
B2["y2 = 5"]
C["x2=phi(x1,x3)\ny3=phi(y1,y2,y3)\nz1=x2*x3\nq1=y3*y3\nw1=y3+2"]
D["w2=w1+5"]
E["w3=phi(w1,w2)\np1=w3+y3\nx3=x2+1\nq2=q1+1"]
A--> B1
A--> B2
B1--> C
B2--> C
C-->D
D--> E
C--> E
E--> C
E--> After

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TB
A["x1 = 3"]
B1["y1 = 4"]
B2["y2 = 5"]
LP["y3=phi(y1,y2)"]
C["x2=phi(x1,x3)\nz1=x2*x3\nq1=y3*y3\nw1=y3+2"]
D["w2=w1+5"]
E["w3=phi(w1,w2)\np1=w3+y3\nx3=x2+1\nq2=q1+1"]
A--> B1
A--> B2
B1-->LP --> C
B2-->LP 
C-->D
D--> E
C--> E
E--> C
E--> After

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TB
A["x1 = 3"]
B1["y1 = 4"]
B2["y2 = 5"]
LP["y3=phi(y1,y2)"]
C["x2=phi(x1,x3)\nz1=x2*x3\nq1=y3*y3\nw1=y3+2"]
D["w2=w1+5"]
E["w3=phi(w1,w2)\np1=w3+y3\nx3=x2+1\nq2=q1+1"]
A--> B1
A--> B2
B1-->LP --> C
B2-->LP 
C-->D
D--> E
C--> E
E--> C
E--> After

while loop may not execute, we can restructure into a do-while

while(e) {
  s(j) 
}

if (e) {
  t  = j loopinv 
  do {
    s
  } while(e)
}

check for zero trip count

%%{init: {"flowchart": {"htmlLabels": false}} }%%
graph TB
A;
B["if e"]
S
endloop
next
A--> B
B--> next
B--> S
S--> endloop
endloop --> B

%%{init: {"flowchart": {"htmlLabels": false}} }%%
graph TB
A;
B["if e"]
S
endloop
next
A--> B
B--> next
B--> S
S--> endloop
endloop --> B

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TB
pretest["if e"]
pretest-->next 
do
s
endloop
posttest["if e"]
next
pretest-->do
do --> s
s--> endloop
endloop--> posttest
posttest --> do
posttest --> next

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TB
pretest["if e"]
pretest-->next 
do
s
endloop
posttest["if e"]
next
pretest-->do
do --> s
s--> endloop
endloop--> posttest
posttest --> do
posttest --> next

In ssa a variable is loop invariant if it does not have a phi function at the header of the loop, or it is defined by a phi function and all the inputs are loop constants

SSA If we find a loop-invariant computation in SSA form, then we just move it out of the loop to a block before the loop. When moving a (side effect-free) SSA loop-invariant computation to a previous position, nothing can go wrong, because the value it computes cannot be overwritten later and the values it depends on cannot have been changed

licm

Loop invariant code motion recognizes computations in loop that produce the same value on each iteration and moves them out of the loop.

A very common case for this is matrix addressing

a[i,j] might expand to to \(i*4*\operatorname{stride_{a}} + j *4\)

for j 
  a[i,j] = f(a[i,j+1])

turns into

a = 
b = 
result = 0
for (){
    result += a*b
}

when is ok to move a computation

no side effects - cannot move alloc 10 outside of loop;
in non ssa, computation d dominates all loop exits where d is live
in non ssa only one def of d in the loop

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TB
pre["d = 0"]
L1["if i > n" ]
loop1["i = i +1"]
d["d = a op b"]
use["  =d "]
next[" = d"]

pre--> L1
L1--> loop1
loop1--> d
d--> use 
use --> L1
L1 --> next

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TB
pre["d = 0"]
L1["if i > n" ]
loop1["i = i +1"]
d["d = a op b"]
use["  =d "]
next[" = d"]

pre--> L1
L1--> loop1
loop1--> d
d--> use 
use --> L1
L1 --> next

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TB
pre["d = 0"]
L1["i = i +1\n d= a op b\n use d"]
L2["d = 2\n use d"]
L3["if (i < n)"]
pre--> L1
L1--> L2
L2--> L3
L3--> after
L3 --> L1

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TB
pre["d = 0"]
L1["i = i +1\n d= a op b\n use d"]
L2["d = 2\n use d"]
L3["if (i < n)"]
pre--> L1
L1--> L2
L2--> L3
L3--> after
L3 --> L1

find loop invariant instructions,

naturally iterative

iterate to convergence
 for each instr in  the loop
   see if it is loop invar 
   if it is - move it

nested loops

we want to process inner loops first

add all the pre-headers
walk the dominator tree in reverse post order - saving all the loop headers

def of loop invariant for an instruction d = op a,b

a,b are constants or,
a,b defined outside the loop
a,b are loop invariants

in SSA form if we find a loop invariant instruction we can always move it into the pre-header, because the value it writes is never rewritten, and the values that it depends on come from outside the loop

test at the bottom - loop always executes at least once

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TD
l0["l0:"]
pre["preheader"]
l1["l1: i = i +1"]
d1["d1 = a ⊕ b"]
d2["   = d1"]
l0-->pre
pre --> l1
l1--> d1
d1--> d2
d3["(i < N) goto L1"]
d2--> d3
d3--> l1
d3--> Next

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TD
l0["l0:"]
pre["preheader"]
l1["l1: i = i +1"]
d1["d1 = a ⊕ b"]
d2["   = d1"]
l0-->pre
pre --> l1
l1--> d1
d1--> d2
d3["(i < N) goto L1"]
d2--> d3
d3--> l1
d3--> Next

can move d

test at the top

%%{init: {"flowchart": {"htmlLabels": false}} }%%
graph TD
l0["l0:d = 0"]
pre["preheader"]
l1["if (i>=N) goto L2 \nl1: i = i +1"]
l0-->pre
pre --> l1
d1["d1 = a ⊕ b"]
d2["   = d1"]
l1--> d1
d1--> d2
d2--> l1
l1--> l2
l2--> next

%%{init: {"flowchart": {"htmlLabels": false}} }%%
graph TD
l0["l0:d = 0"]
pre["preheader"]
l1["if (i>=N) goto L2 \nl1: i = i +1"]
l0-->pre
pre --> l1
d1["d1 = a ⊕ b"]
d2["   = d1"]
l1--> d1
d1--> d2
d2--> l1
l1--> l2
l2--> next

we can always convert into


loop test at top 


if (test) {
  preheader
  loop test at bottom
}

induction variable elimination

for (int i = 0; i < 100; ++1){
    f(a[i])
}

calculate a[i] as: &a[0] + 4 * i in every loop iteration, but the values at each step only differ by 4

We want to change the multiply and add to an add

Transformation

a_i = &a[0] before the loop
a_i = a_i + 4 (add the stride) in every iteration
the only remaining use of i is the test i < 100, which could become a_i < &a[0] + 4*100 (which is loop invariant)

step 1

find basic induction variables i = i + e, where e is loop invariant

what does this look like in ssa

for each instruction d = c +- loop invariant see if there is a strongly connected graph in the ssa edges that only has adds and subtracts of loop invariant expressions

Step 2 find auxiliary induction variables

j = basic_ind * loop inv + loop invar

for (int i = 0; i < n; i++) {
     j = 2*i + 1;     // Y 
     k = -i;          // Y 
     l = 2*i*i + 1;   // N 
     c = c + 5;       // Y* 
}

step 3

replace auxiliary induction variables (derived ) by new variables without the multiply

step4

if the only remaining use of the induction variable is the termination test, change the test to use the new variable

sum = 0
for (i = 1, i < 100; i++) {
  sum = sum + a[i -1]
}

in SSA form:

   sum0 = 0
   i0 = 1
L: sum1 = phi(sum0, sum2)
   i1 = phi(i0, i2)
   t10 = i1 -1 
   t20 = t10 * 4
   t30 = t20 + &a
   t40 = load t30
   sum2 = sum1 + t40
   i2 = i1 + 1
   if (i2 <= 100)go to l

i is a basic induction variable
t10 is a aux induction variable
t20 is an aux induction variable
t30 is an aux induction variable

t3 has a use in the load

t3 = t20 + &a ==> t10 * 4 + &a ==> (i1-1)* 4+ &a

t3 = 4* i1 + &a - 4

   sum0 = 0
   i0 = 1
   t50 = &a -4  // initial value 
L: sum1 = phi(sum0, sum2)
   i1 = phi(i0, i2)
   t51 = phi(t50, t52)
   //t10 = i1 -1 
   //t20 = t10 * 4
   //t30 = t20 + &a
   t40 = load t50
   sum2 = sum1 + t40
   i2 = i1 + 1
   t52 = t50 + 4
   if (i2 <= 100)go to l

   sum0 = 0
   i0 = 1
   t50 = &a -4  // initial value 
L: sum1 = phi(sum0, sum2)
   // i1 = phi(i0, i2)
   t51 = phi(t50, t52)
   //t10 = i1 -1 
   //t20 = t10 * 4
   //t30 = t20 + &a
   t40 = load t50
   sum2 = sum1 + t40
   //i2 = i1 + 1
   t52 = t50 + 4
   if (t52 <= 396 + &a )go to l

loop un-switching

for (int i = 0 ; i < 100; ++1){
    if (c) {  // c is loop invariant 
        f(i)
    } else {
        g(i)
    }
}

look for special patterns and replace

if (c) {  // c is loop invariant 
   for (int i = 0 ; i < 100; ++1){
        f(i)
    } 
}else {
    for (int i = 0 ; i < 100; ++1){
        g(i)
    }
}

This is often done before vectorization

loop fusion

for (i = 0; i < 100 ; ++){
 s0:   b[i] = f(a[i])
}
for (i = 0; i < 100 ; ++){
 s1:   c[i] = f(b[i])
}

when is it legal to do this?
When can we get rid of the b array?

There is also an optimization that goes the other way split a loop so that each statement becomes a separate loop incase we could run as vectors

Other Formats