classic loop optimizations
Loops optimizations are important because
- typically there is a regular access pattern
- the body of a loop gets repeated
- compilers often assume \(10^{depth}\) times
What are classic loop optimizations?
- Loop Invariant Code Motion
- Induction Variable Recognition
- Strength Reduction
- Linear Test Replacement
- Loop Unrolling
Less classic loop optimizations
- Scalar replacement
- Loop Interchange
- Loop Fusion
- Loop Distribution (also known as Fision
- Loop Skewing
- Loop Reversal
First recall natural loops
- strongly connected region in the cfg
- one entry point (dominates all the nodes in the loop)
def of loop invariant for an instruction d = op a,b
- a,b are constants or,
- a,b defined outside the loop
- a,b are loop invariants
in SSA form if we find a loop invariant instruction we can always move it into the pre-header, because the value it writes is never rewritten, and the values that it depends on come from outside the loop
conditions when moving an instruction d = a op b is ok
L0: d = 0
preheader:
L1: i = i + 1
d = a ⊕ b
= d
if (i<N) goto L1
L2: x = d
can move d
L0: d = 0 preheader L1: if (i>=N) goto L2 i = i + 1 d = a ⊕ b = d goto L1 L2: x = d ```
no good d used after the loop, would not be changed if the loop executes zero times
L0: d = 0
preheader
L1: i = i + 1
d = a ⊕ b
= d
d = 0
= d
if (i<N) goto L1
L2: L0: d = 0
no good d reassigned in the loop, do invar would be changed
l0: d = 0
preheader
L1: = d
i = i + 1
d = a ⊕ b
= d
if (i<N) goto L1
L2: x = d
conditions without SSA
1) the instruction dominates all the loop exits, where d is still live
1) d is only defined once
1) d in not live before the instruction
in SSA
1) is d is live in some block after the loop, then d has to dominate that block
2) clear
3) clear
Suppose the loop might run zero times
while (e) { j = loopinv // may never execute S }
j = loopinv // always executes while (e) { S }
can be converted into
if (e) { j = loopinv // may never execute while (e) { S }
} ````
induction variable elimination
for (int i = 0; i < 100; ++1){
f(a[i])
}
calculate a[i] as: &a[0] + 4 * i in every loop iteration, but the values at each step only differ by 4
- a_i = &a[0] before the loop
- a_i = a_i + 4 (add the stride) in every iteration
- the only remaining use of i is the test i < 100, which could become a_i < &a[0] + 4*100 (which is loop invariant)
steps
1find basic induction variables i = i + e, where e is loop invariant
what does this look like in ssa
loop header:
i1 = phi(i0, i2)
loop body:
i2 = i1 + e
loop header:
i1 = phi(i0, i2)
loop body:
a0 = i1 + e
i2 = a0 + e1
for each instruction d = c +- loop invariant see if there is a strongly connected graph in the ssa edges that only has adds and subtracts of loop invariant expressions
Step 2 find auxiliary induction variables
j = basic_ind * loop inv + loop invar
for (int i = 0; i < n; i++) {
j = 2*i + 1; // Y
k = -i; // Y
l = 2*i*i + 1; // N
c = c + 5; // Y*
}
step 3 replace auxiliary induction variables (derived ) by new variables without the multiply
step4 if the only remaining use of the induction variable is the termination test, change the test to use the new variable
sum = 0
for (i = 1, i < 100; i++) {
sum = sum + a[i -1]
}
in SSA form:
sum0 = 0
i0 = 1
L: sum1 = phi(sum0, sum2)
i1 = phi(i0, i2)
t10 = i1 -1
t20 = t10 * 4
t30 = t20 + &a
t40 = load t30
sum2 = sum1 + t40
i2 = i1 + 1
if (i2 <= 100)go to l
- i is a basic induction variable
- t10 is a aux induction variable
- t20 is an aux induction variable
- t30 is an aux induction variable
t3 has a use in the load
t3 = t20 + &a ==> t10 * 4 + &a ==> (i1-1)* 4+ &a
t3 = 4* i1 + &a - 4
sum0 = 0
i0 = 1
t50 = &a -4 // initial value
L: sum1 = phi(sum0, sum2)
i1 = phi(i0, i2)
t51 = phi(t50, t52)
//t10 = i1 -1
//t20 = t10 * 4
//t30 = t20 + &a
t40 = load t50
sum2 = sum1 + t40
i2 = i1 + 1
t52 = t50 + 4
if (i2 <= 100)go to l
sum0 = 0
i0 = 1
t50 = &a -4 // initial value
L: sum1 = phi(sum0, sum2)
// i1 = phi(i0, i2)
t51 = phi(t50, t52)
//t10 = i1 -1
//t20 = t10 * 4
//t30 = t20 + &a
t40 = load t50
sum2 = sum1 + t40
//i2 = i1 + 1
t52 = t50 + 4
if (t52 <= 396 + &a )go to l
loop un-switching
for (int i = 0 ; i < 100; ++1){
if (c) { // c is loop invariant
f(i)
} else {
g(i)
}
}
look for special patterns and replace
if (c) { // c is loop invariant
for (int i = 0 ; i < 100; ++1){
f(i)
}
}else {
for (int i = 0 ; i < 100; ++1){
g(i)
}
}
This is often done before vectorization
loop fusion
for (i = 0; i < 100 ; ++){
s0: b[i] = f(a[i])
}
for (i = 0; i < 100 ; ++){
s1: c[i] = f(b[i])
}
- when is it legal to do this?
- When can we get rid of the b array?
There is also an optimization that goes the other way split a loop so that each statement becomes a separate loop incase we could run as vectors
These sort of loop optimizations would make good projects