local value numbering

Local Value Numbering

slides from Phil Gibbons at CMU for more details and context on LVN

Value numbering is a very powerful technique that removes redundancies, An instruction x + y is redundant inside a block if it has already been computed in the block, and no intervening operation redefines x or y. If the compiler finds a redundant expression, it can save that value at the first computation and replace any subsequent evaluations with references to the saved value.

The idea is simple - The algorithm executes the block, Each time it sees a new variable it gives it a value (represented as a number)

Each time it sees an instruction it forms a hash of the op code and the value numbers of its operands and gives it a new value number.

Two instructions are redundant if they have same op code and operands, which means the same value number

$e_i$ and $e_j$ have the same value number if and only if $e_i$ and $e_j$ are provably equal for all possible operands of the expressions.

local value numbering covers lot of optimizations that look different

dead code elimination

main {
    a: int = const 100;
    a: int = const 42;
    print a;

}

copy propagation

main{
    x: int = const 4;
    copy1: int = id x;
    copy2: int = id copy1;
    print copy2;
}

common sub-expression elimination cse 

main {
    a: int = const 4;
    b: int = const 2;
    sum1: int = add a b;
    sum2: int = add a b;
    prod: int = mul sum1 sum2;
    print prod;
}

variables vis values

We want to stop thinking about variables and think about values. Two instructions are redundant if they compute the same value.

for example in a JIT compiler we want computation to be fast so we can get rid of all the variables

b: int const 1;
c: int cont 2;
a:  int b c;

becomes:

[  int const 1
   int const 2 
   int 0 1
]

less storage, args are just pointers, instructions are smaller. faster because any use points to the corresponding def without any searching.

value numbering continued

use of turnt

turnt

there is a directory bril/examples/test/tdce which has some test programs

and a file turnt.toml that contains one line

command = “bril2json < {filename} | python3 ../../tdce.py {args} | bril2txt”

to execute

(.venv) (base) norm@norm-ubuntu:~/bril/examples/test/tdce$ turnt *.bril 1..8 ok 1 - combo.bril ok 2 - diamond.bril ok 3 - double.bril ok 4 - double-pass.bril ok 5 - reassign.bril ok 6 - reassign-dkp.bril ok 7 - simple.bril ok 8 - skipped.bril

redundancy elimination

an expression x+y is redundant if and only if

along every path from the entry it has been evaluated and
its subexpressions x and y have not been redefined

if the compiler can prove an expression is redundant it can

preserve the earlier evaluation
replace the redundant expression with a use of the preserved value

key idea

assign a number (value number) to each expression

two expressions have the same value number if they always have the same value
use hashing to make this efficient

pseudo code

walk each block, assign a distinct value number to each value the block computes.

$e_i$ and $e_j$ have the same value number if and only if $e_i$ and $e_j$ are provably equal for all possible operands of the expressions.

pseudo code vn version 1

we have two tables - hash_table: expression to vn, variable holding the value variable to vn

for each instr in the block
  v= [ value_number(a) for a in the args of the instr]
  build temp inst hash = instr.op + v
  if hash in hash_table:
     get from table vn, cann_variable 
     replace instr with instr.dest = cann_variable
     instr.dest = vn
  else: 
    generate a new value number, add new entry to hash_table, new vn, instr.dest

An example

a add b, c 
b sub a, d 
c add b, c
d sub a, d  // d id b

item   vn       hash 
b      0/4
c      1/5
                add12  2    a 
a      2 
d      3

                sub23 4     b
                add41 5     c

Pseudo code (similar to an interpreter)

hash table constants and expressions of value numbers to value numbers and to a variable holding the value
reverse map from variables to value numbers

  main {
    a: int = const 4;
    b: int = const 2;
    sum1: int = add a b;
    sum2: int = add a b;
    prod: int = mult sum1 sum2;
    print prod

  }

key	value	canonical name
const 4	1	a
const 2	2	b
add 1 2	3	sum1
mul 3 3	4	prod

name	value
a	1
b	2
sum1	3
sum2	3
prod	4

extensions:

a: int id b

a gets the value number of b. No copy required

Commutative operations Commutative operations that differ only in the order of their operands, such as a × b and b × a, should receive the same value numbers. As lvn constructs a hash key for the right-hand side of the current operation, it can sort the operands using some convenient scheme, such as ordering them by value number. This simple action will ensure that commutative variants receive the same value number.

extension

constant folding 
   a: int const 1;
   b: int const 2;
   c: add a b;

Constant folding If all the operands of an operation have known constant values, lvn can perform the operation and fold the answer directly into the code. lvn can store information about constants in the hash table, including their value. Before hash-key formation, it can test the operands and, if possible, evaluate them. If lvn discovers a constant expression, it can replace the operation with an immediate load of the result. Subsequent copy folding will clean up the code.

Algebraic identities: lvn can apply algebraic identities to simplify the code. For example, x + 0 and x should receive the same value number. Unfortunately, lvn needs special-case code for each identity. A series of tests, one per identity, can easily become long enough to produce an unacceptable slowdown in the algorithm. To ameliorate this problem, lvn should organize the tests into operator-specific decision trees.

a +0, a-0, a1 a0, a-a

vn version 2

add a bit indicating that a variable is a constant

for each instr in the block
  v= [ value_number(a) for a in the args of the instr]
  if all v's are constants, fold the operation 
  check for all the identities 
  build temp inst hash = instr.op + v
  if hash in hash_table:
     get from table vn, cann_variable 
     replace instr with instr.dest = cann_variable
     instr.dest = vn
  else: 
    generate a new value number, add new entry to hash_table, new vn, instr.dest

problem:

a = x +y 
b = x + y
a = 17
c = x +y

keep track of all variables that contain the value and select one

one option is to save the value, if x will be overwritten add a temp

t = a+b
x = t 
x = 
  = t

another option is renaming

a = x + y
b = x + y
a = 17
c = x +Y

a0 = x0 + y0
b0 = x0+ y0
a1 = 17
c0 = x0 +y0

indirect assignments

assignments via a pointer, or to an array element

a = b[i]
...       no change to i 
c = b[i]


a = b[i]
i=
c = b[i]


a = b[i]
b[k] =
   =b[i]
   =b[k]

indexed stores

when we see an assignment a[i] = exp

we have 3 value numbers a, i, exp

give the array a new value number give the array[i] operation the value number of the exp

Local value numbering.

You can see one implementation in lvn.py in the Bril repository. But seriously, don’t be tempted! You want to write your implementation without looking at mine!

examples

Testing Your Optimizations

As part of your tasks for this lesson, you will implement your first two optimizations. The two main things you want your optimizations to do are:

Not break programs.
Make programs faster, most of the time.

As with every task in this class, part of the work is checking that you have done what you set out to do — in this case, that your optimizations do those two things.

Think carefully about how to make a convincing case for each of those criteria.

One tempting methodology might be to hand write a few small test-case Bril programs (or, worse, borrow the woefully inadequate ones sitting around in the Bril git repository), run them through your optimizations, and look at them to check whether they look right. This does not amount to convincing evidence (maybe you can think of a few specific reasons why).

While there are many ways to be convincing, a pretty good way might be to run your optimization on *every single available Bril benchmark, systematically check that it still produces the right output for at least one input, and collect aggregate statistics about some metric you’re interested in. This is a nice way to check for unexpected behavior in programs that you didn’t carefully write yourself to test the cases you’re thinking of.

If this is the route you choose, you can do it however you like, There is a simple tool that you can consider using, called Brench. Brench is not very fancy; it does three things:

It makes it easy to run a long list of inputs through several different commands. (For example, you can run a long list of Bril benchmarks through an “interpret” command and an “optimize-and-then-interpret” command.)
It checks that all the commands agree on their output. (So, in our use case, it checks that optimizing the benchmark doesn’t change its output when interpreted.)
It can collect a statistic from each command for comparison. (Like the number of dynamic instructions the interpreter executed, which is a pretty good metric for standard optimizations.)

Those three things are probably what you want to do to make a convincing case for an optimization’s correctness and effectiveness, whether or not you use Brench. It’s there if you want it, but feel free to go your own way!

homework 2

part 1: Implement “trivial” dead code elimination in which you delete instructions that are never used before they are reassigned.

part2: Implement local value numbering. Try pairing it with your dead code elimination code, in the write up be sure to include evidence that your implementation is correct and actually optimizes programs, you might want to use the Brench program, for extra points, extend your implementation to handle some of the tricker examples talked about in class.

remember that the result is a blog post

crossing blocks

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TB
A["m = a + b<br> n = a + b"]
B["p = c + d<br>r = c + d"]

C["q = a + b<br> r = c + d"]
D["e = b + 18<br> s = a + b <br> u = e + f"]
E["e = a + 17<br> t = c + d <br> u = e + f"]
F["v = a + b <br> w = c + d <br> x = e + f"]
G["y = a + b <br> z = c + d"]

style A fill:#ffffff,stroke:#000000,stroke-width:1px
style B fill:#ffffff,stroke:#000000,stroke-width:1px
style C fill:#ffffff,stroke:#000000,stroke-width:1px
style D fill:#ffffff,stroke:#000000,stroke-width:1px
style E fill:#ffffff,stroke:#000000,stroke-width:1px
style F fill:#ffffff,stroke:#000000,stroke-width:1px
style G fill:#ffffff,stroke:#000000,stroke-width:1px
A--> B
A--> C
C --> D
C --> E
D--> F
E --> F
F--> G
B--> G

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TB
A["m = a + b<br> n = a + b"]
B["p = c + d<br>r = c + d"]

C["q = a + b<br> r = c + d"]
D["e = b + 18<br> s = a + b <br> u = e + f"]
E["e = a + 17<br> t = c + d <br> u = e + f"]
F["v = a + b <br> w = c + d <br> x = e + f"]
G["y = a + b <br> z = c + d"]

style A fill:#ffffff,stroke:#000000,stroke-width:1px
style B fill:#ffffff,stroke:#000000,stroke-width:1px
style C fill:#ffffff,stroke:#000000,stroke-width:1px
style D fill:#ffffff,stroke:#000000,stroke-width:1px
style E fill:#ffffff,stroke:#000000,stroke-width:1px
style F fill:#ffffff,stroke:#000000,stroke-width:1px
style G fill:#ffffff,stroke:#000000,stroke-width:1px
A--> B
A--> C
C --> D
C --> E
D--> F
E --> F
F--> G
B--> G

extended basic blocks

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TB
A-->b
A-->c
c-->d
c-->e

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TB
A-->b
A-->c
c-->d
c-->e

how do we extend the hash tables over the boundary

we need to do value numbering over each path

worklist = {entry block}
stack = {}
while worklist is not empty 
   remove a block b from the worklist 
   evn(b)

evn(b, stack)
   t = new table for b  
   link t above stack  
   lvn(b,t) 
   for each s successor of b,
     if s has one pred then evn(s, t)
     else add s to worklist 
   dealocate t

safety

if the result of evaluating E1 cannot be distinguished from evaluating E the compiler is free to replace E with E1

Some compilers assume it is ok if E1 produces less errors than E

some compilers assume that safety is only required for “standard conforming” code and undefined behavior for other code.

Why is value numbering safe?

if an expression is in the hash table, it must have occurred at least one in the block
Algorithm modified the code but does not invalidate the table

when is value numbering profitable

if reuse is cheaper then re-compute 1. does not cause a spill 1. if does not need a copy (does the copy take as long as the compute)

Other Formats