5 Global Analysis

Graph Properties

We are going to define assorted graph properties, that can be calculated on cfgs.

dominators

We first define a binary relation on cfg nodes, called dominance. a node d dominates a node i (d dom i) if every possible execution path in the cfg that goes from the entry to i goes through d.

Dom is reflexive, so a dom a for all nodes a.
Dom is transitive, a dom b, b dom c ==> a dom c
Dom is anti-symmetric if a dom b, and b dom a then b = a

dominator trees

We next define immediate dominators a idom b, a != b and there is no c != a and c != b where a dom c and c dom b.

idom is unique
idom forms a tree called the dominator tree, root is the entry of the cfg

A strict dominator a sdom b if a dom b and a != b

an example

A control flow graph

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TD;
n0 --> n1;
n1 --> n2;
n1 --> n3;
n2 --> n4;
n3 --> n4;

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TD;
n0 --> n1;
n1 --> n2;
n1 --> n3;
n2 --> n4;
n3 --> n4;

The dominator tree

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TD;
n0 --> n1;
n1 --> n2;n1 --> n3
n1 --> n4;

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TD;
n0 --> n1;
n1 --> n2;n1 --> n3
n1 --> n4;

another example

A control flow graph

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TD;
n1 --> n2;
n2 --> n3;
n2 --> n4; n2--> n6;
n3 --> n5
n4 --> n5
n5 --> n2

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TD;
n1 --> n2;
n2 --> n3;
n2 --> n4; n2--> n6;
n3 --> n5
n4 --> n5
n5 --> n2

Dominator tree

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TD;
n1--> n2
n2 --> n3
n2 --> n4
n2--> n5
n2 --> n6

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TD;
n1--> n2
n2 --> n3
n2 --> n4
n2--> n5
n2 --> n6

simple implementation dominators

\[ \begin{gathered} \operatorname{Dom}\left(n_o\right)=\left\{n_o\right\} \\ \operatorname{Dom}(n)=\{n\} \cup\left(\bigcap_{p \in \operatorname{preds}(n)} \operatorname{Dom}(p)\right) \end{gathered} \]

To find the dominators of a node, first put the node itself in the dominators set. Then, take all the common (i.e. intersection) dominators of its predecessors and put them in the set.

What order do we want to process the nodes?

pseudo code

assume nodes start at 0,

compute_dominators(CFG cfg) {
  cfg[0].dominators = {0}
  for (bb in cfg except 0) {
    b.dominators = {all nodes in cfg}
  }

  do {
    change = false;
    for (bb in cfg except 0) {
      temp = {all nodes in cfg}
      for (pred in bb.predecessors) {
        temp = intersect(temp, pred.dominators)
      }
      temp = union(temp, {bb})
      if (temp != bb.dominators) {
        change = true
        bb.dominators = temp
      }
    }
  } while (change);
}

How do we implement this

number the vertices starting at 0, vertices are 0,1,2, number_of_vertices -1 so we could use a bit-vector for the set, and we should process vertices in reverse post order

a faster way

Cooper, Harvey, Kennedy Algorithm

if we have the dominator tree, finding immediate dominators is easy, its the parent of the node Finding dominators is also easy, its all the parents on the path from the entry to the node

suppose we have a node in the cfg with two parents, like n4, if we takes paths backward in the dominator tree the first common ancestor is n1, (the dominator)

a more complex example

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TD;
n0 --> n5;
n0 --> n1;
n5 --> n7;
n5 --> n6;
n1 --> n2 ;
n1 --> n3;
n7 --> n8;
n6 --> n4;
n2 --> n4;
n4 --> n8 ;
n3 --> n8;

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TD;
n0 --> n5;
n0 --> n1;
n5 --> n7;
n5 --> n6;
n1 --> n2 ;
n1 --> n3;
n7 --> n8;
n6 --> n4;
n2 --> n4;
n4 --> n8 ;
n3 --> n8;

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TD;
n0--> n5
n0 --> n1
n5--> n7
n5--> n6
n1--> n2
n1 --> n3

%%{init: {"flowchart": {"htmlLabels": false}} }%%

graph TD;
n0--> n5
n0 --> n1
n5--> n7
n5--> n6
n1--> n2
n1 --> n3

need n4 and n8

both are dominated by n0

subproblem: find lowest common ancestor in dt of two nodes a and b

for each node in the dom tree we have the depth, how far from the root, so if a and b have the same parent, that is the dominator, otherwise move the node with the higher depth up one

a fast way to determine which node is lower keep the nodes in post order, nodes at the top of the cfg have higher numbers

part1

intersect(b1, b2, idoms,postorder_map) {
  while (b1 != b2) {
    if (postorder_map[b1] < postorder_map[b2]) {
      b1 = idoms[b1];
    } else {
      b2 = idoms[b2];
    }
  }
  return b1;

pseudo code

void compute_dominators(CFG cfg) {
  // Some initialization steps and e.g. get postorder.

  // Map its basic block to its postorder traversal.
  foreach (p ; postorder) {
    postorder_map[p] = counter;
    ++counter;
  }

  bool change;
  do {
    change = false;
    foreach_reverse i in postorder) {
      bb = cffg block i 
      new_idom = bb.preds[0];  // Arbitrarily choose the first predecessor

      for pred in preds (bb)) {
        if (cfg.idoms[pred] != CFG.UNDEFINED_IDOM) {
          new_idom = intersect(new_idom, pred, cfg.idoms, postorder_map);
        }
      }
      if (cfg.idoms[i] != new_idom) {
        cfg.idoms[i] = new_idom;
        change = true;
      }
    }
  } while (change);
}

dominator frontiers

A node A has a dominance frontier which are set of nodes b where A does not dominate b but A dominates a pred of b. Lets see n5 dominance frontier

Finally we have a post dominates b if all paths from b to the exit go through a. for instance n4 post dominates n6.

natural loops

graph TD;
  entry --> loop
  loop --> if 
  if --> then
  if --> else
  then --> endif
  else --> endif
  endif --> loop
  loop --> exit

  graph TD;
  entry --> loop
  loop --> if 
  if --> then
  if --> else
  then --> endif
  else --> endif
  endif --> loop
  loop --> exit

has to have a cycle in cfg (strongly connected)
single entry point (called the header ) header

cycle but not header

How about an example that has a cycle and no header

graph TD;
    entry --> if;
    if --> loop1
    if --> loop2
    loop2 --> loop1
loop1 --> loop2

    graph TD;
    entry --> if;
    if --> loop1
    if --> loop2
    loop2 --> loop1
loop1 --> loop2

This loop has two entry points.

natural loops

A back-edge is an edge A->B, where B dominates A

other edges are forward edges

Natural loops:

for a back-edge A->B, B is the header of the loop
the smallest set of vertices L including A and B, such that for all v in L either preds(v) are in L or v == B

example

%%{init: {"flowchart": {"htmlLabels": false}} }%%

    graph TD;
    entry --> H1
    H1 --> A
    A --> H2
    H2 --> B
    B --> H2
    B --> H1
    H1 --> exit

%%{init: {"flowchart": {"htmlLabels": false}} }%%

    graph TD;
    entry --> H1
    H1 --> A
    A --> H2
    H2 --> B
    B --> H2
    B --> H1
    H1 --> exit

Backedges B -> H2,
B-> H1

for B-> H2, loop is H2,

for B-> H1, loop is H1, A, H2, B

reducible control flow

in a reducible cfg every back edge has a natural loop.

A reducible CFG is one with edges that can be partitioned into two disjoint sets: forward edges, and back edges, such that:

Forward edges form a directed acyclic graph with all nodes reachable from the entry node.

For all back edges (A, B), node B dominates node A.

what is the surface version

Structured programming languages are often designed such that all CFGs they produce are reducible, and common structured programming statements such as IF, FOR, WHILE, BREAK, and CONTINUE produce reducible graphs. To produce irreducible graphs, statements such as GOTO are needed. Irreducible graphs may also be produced by some compiler optimizations.

t1 and t2 transforms

Let G be a CFG. Suppose n is a node in G with a self-loop, that is, an edge from n to itself.

Transformation T1: on node n is removal of this self-loop.

Let n1 and n2 be nodes in G such that n2 has the unique direct ancestor n1, and n2 is not the initial node.

transformation T2: on node pair (n1,n2) is merging nodes n1 and n2 into one node,

t1 / t2

%%{init: {"flowchart": {"htmlLabels": false}} }%%

    graph TD;
    a0[" "] --> n
a1[" "] --> n
n --> n
 n --> b[" "]
n --> b1[" "]

%%{init: {"flowchart": {"htmlLabels": false}} }%%

    graph TD;
    a0[" "] --> n
a1[" "] --> n
n --> n
 n --> b[" "]
n --> b1[" "]

%%{init: {"flowchart": {"htmlLabels": false}} }%%

    graph TD;
    a0[" "] --> n1
a1[" "] --> n1
n1 --> n2
n2--> n1
 n2 --> b[" "]
n2 --> b1[" "]

%%{init: {"flowchart": {"htmlLabels": false}} }%%

    graph TD;
    a0[" "] --> n1
a1[" "] --> n1
n1 --> n2
n2--> n1
 n2 --> b[" "]
n2 --> b1[" "]

%%{init: {"flowchart": {"htmlLabels": false}} }%%

    graph TD;
    a0[" "] --> n
a1[" "] --> n
 n --> b[" "]
n --> b1[" "]

%%{init: {"flowchart": {"htmlLabels": false}} }%%

    graph TD;
    a0[" "] --> n
a1[" "] --> n
 n --> b[" "]
n --> b1[" "]

%%{init: {"flowchart": {"htmlLabels": false}} }%%

    graph TD;
    a0[" "] --> n1["n1_n2"]
a1[" "] --> n1
 n1 --> b[" "]
n1 --> b1[" "]

%%{init: {"flowchart": {"htmlLabels": false}} }%%

    graph TD;
    a0[" "] --> n1["n1_n2"]
a1[" "] --> n1
 n1 --> b[" "]
n1 --> b1[" "]

example

int  n = (count + 7) / 8;
switch (count % 8) {
case 0: do { *to = *from++;
case 7:      *to = *from++;
case 6:      *to = *from++;
case 5:      *to = *from++;
case 4:      *to = *from++;
case 3:      *to = *from++;
case 2:      *to = *from++;
case 1:      *to = *from++;
        } while (--n > 0);
}

simplified control flow

%%{init: {"flowchart": {"htmlLabels": false}} }%%

    graph TD;
    entry --> switch;
    switch --> case0-7
    switch --> case1
    switch --> case2
    case0-7 --> case2
    case2--> case1
    case1 --> dowhile
    dowhile --> case0-7
    dowhile --> exit

%%{init: {"flowchart": {"htmlLabels": false}} }%%

    graph TD;
    entry --> switch;
    switch --> case0-7
    switch --> case1
    switch --> case2
    case0-7 --> case2
    case2--> case1
    case1 --> dowhile
    dowhile --> case0-7
    dowhile --> exit

not reducible

other optimizations interactions

loop: if (cond) goto past_loop
    s1
    call bar()
    goto loop
pastloop:

function bar()
    b1 
    if () return
    b2

inline the function, combine jmps to jmps

loop: if (cond) goto past_loop
    s1
    b1
    if () go to next
    b2
    next:
goto loop

%%{init: {"flowchart": {"htmlLabels": false}} }%%

    graph TD;
    loop--> s1
    loop---> past_loop
    s1--> b1
    b1 -->inline_if
    inline_if --> b2
    b2 --> next_goto
    inline_if --> loop
    next_goto --> loop

%%{init: {"flowchart": {"htmlLabels": false}} }%%

    graph TD;
    loop--> s1
    loop---> past_loop
    s1--> b1
    b1 -->inline_if
    inline_if --> b2
    b2 --> next_goto
    inline_if --> loop
    next_goto --> loop

Now we have two back edges so two loops

Other Formats

Graph Properties

dominators

dominator trees

an example

another example

simple implementation dominators

pseudo code

How do we implement this

a faster way

a more complex example

subproblem: find lowest common ancestor in dt of two nodes a and b

part1

pseudo code

dominator frontiers

natural loops

cycle but not header

natural loops

example

reducible control flow

what is the surface version

t1 and t2 transforms

t1 / t2

example

simplified control flow

other optimizations interactions

inline the function, combine jmps to jmps