Loops optimizations are important because
What are classic loop optimizations?
Less classic loop optimizations
First recall natural loops
def of loop invariant for an instruction d = op a,b
in SSA form if we find a loop invariant instruction we can always move it into the pre-header, because the value it writes is never rewritten, and the values that it depends on come from outside the loop
conditions when moving an instruction d = a op b is ok
L0: d = 0
preheader:
L1: i = i + 1
d = a ⊕ b
= d
if (i<N) goto L1
L2: x = d
can move d
L0: d = 0 preheader L1: if (i>=N) goto L2 i = i + 1 d = a ⊕ b = d goto L1 L2: x = d ```
no good d used after the loop, would not be changed if the loop executes zero times
L0: d = 0
preheader
L1: i = i + 1
d = a ⊕ b
= d
d = 0
= d
if (i<N) goto L1
L2: L0: d = 0
no good d reassigned in the loop, do invar would be changed
l0: d = 0
preheader
L1: = d
i = i + 1
d = a ⊕ b
= d
if (i<N) goto L1
L2: x = d
conditions without SSA
1) the instruction dominates all the loop exits, where d is still live
1) d is only defined once
1) d in not live before the instruction
in SSA
1) is d is live in some block after the loop, then d has to dominate that block
2) clear
3) clear
Suppose the loop might run zero times
while (e) { j = loopinv // may never execute S }
j = loopinv // always executes while (e) { S }
can be converted into
if (e) { j = loopinv // may never execute while (e) { S }
} ````
for (int i = 0; i < 100; ++1){
f(a[i])
}
calculate a[i] as: &a[0] + 4 * i in every loop iteration, but the values at each step only differ by 4
steps
1find basic induction variables i = i + e, where e is loop invariant
what does this look like in ssa
loop header:
i1 = phi(i0, i2)
loop body:
i2 = i1 + e
loop header:
i1 = phi(i0, i2)
loop body:
a0 = i1 + e
i2 = a0 + e1
for each instruction d = c +- loop invariant see if there is a strongly connected graph in the ssa edges that only has adds and subtracts of loop invariant expressions
Step 2 find auxiliary induction variables
j = basic_ind * loop inv + loop invar
for (int i = 0; i < n; i++) {
j = 2*i + 1; // Y
k = -i; // Y
l = 2*i*i + 1; // N
c = c + 5; // Y*
}
step 3 replace auxiliary induction variables (derived ) by new variables without the multiply
step4 if the only remaining use of the induction variable is the termination test, change the test to use the new variable
sum = 0
for (i = 1, i < 100; i++) {
sum = sum + a[i -1]
}
in SSA form:
sum0 = 0
i0 = 1
L: sum1 = phi(sum0, sum2)
i1 = phi(i0, i2)
t10 = i1 -1
t20 = t10 * 4
t30 = t20 + &a
t40 = load t30
sum2 = sum1 + t40
i2 = i1 + 1
if (i2 <= 100)go to l
t3 has a use in the load
t3 = t20 + &a ==> t10 * 4 + &a ==> (i1-1)* 4+ &a
t3 = 4* i1 + &a - 4
sum0 = 0
i0 = 1
t50 = &a -4 // initial value
L: sum1 = phi(sum0, sum2)
i1 = phi(i0, i2)
t51 = phi(t50, t52)
//t10 = i1 -1
//t20 = t10 * 4
//t30 = t20 + &a
t40 = load t50
sum2 = sum1 + t40
i2 = i1 + 1
t52 = t50 + 4
if (i2 <= 100)go to l
sum0 = 0
i0 = 1
t50 = &a -4 // initial value
L: sum1 = phi(sum0, sum2)
// i1 = phi(i0, i2)
t51 = phi(t50, t52)
//t10 = i1 -1
//t20 = t10 * 4
//t30 = t20 + &a
t40 = load t50
sum2 = sum1 + t40
//i2 = i1 + 1
t52 = t50 + 4
if (t52 <= 396 + &a )go to l
for (int i = 0 ; i < 100; ++1){
if (c) { // c is loop invariant
f(i)
} else {
g(i)
}
}
look for special patterns and replace
if (c) { // c is loop invariant
for (int i = 0 ; i < 100; ++1){
f(i)
}
}else {
for (int i = 0 ; i < 100; ++1){
g(i)
}
}
This is often done before vectorization
loop fusion
for (i = 0; i < 100 ; ++){
s0: b[i] = f(a[i])
}
for (i = 0; i < 100 ; ++){
s1: c[i] = f(b[i])
}
There is also an optimization that goes the other way split a loop so that each statement becomes a separate loop incase we could run as vectors
These sort of loop optimizations would make good projects