zoukankan      html  css  js  c++  java
  • NJU Static Program Analysis 02: Intermediate Representation

    NJU Static Program Analysis 02:
    Intermediate Representation

    Abstraction

    1. The relation between compilers and static analysis
    2. Understand 3AC and its common forms (in IR jimple)
    3. How to build basic blocks on top of IR
    4. How to construct control flow graphs on top of BBs?

    Notes

    Intermediate Representation (IR) is a kind of code generated by the compiler during Translation and is usually considered as the basis of static analysis. It is somehow like a not-that-high-level programming language without any grammar sugar, or an assembly language with more details like typing system. The features of IR includes:

    • low-level and closed to machine code
    • usually independent to a specific programming language
    • compact and uniform
    • contains control flow information

    One of the most popular IR is Three-Address Code (3AC), each of its sentance contains at most three addresses (Address can be Name, Constant, or Compiler-generated temporary variables).

    For example the code:

    do {
        i = i + 1;
    } while (a[i] < v);
    

    can be translated to

    1: i = i + 1
    2: t1 = a[ i ]
    3: if t1 < v goto 1
    

    We have these common forms of 3AC like

    x = y bop z, x = uop y, x = y, goto L, if x goto L, if x rop y goto L,

    where:

    x, y, z refer to addresses,
    bop is a binary arithmetic operation,
    uop is a unary operation,
    rop is a relational operation,
    L is a label to represent a program location,

    Another popular IR is Statuc Single Assignment (SSA). It distinct variables by not only its definition or scope, but also the sentence it belongs to. For instance,

    // 3AC
    1: p = a + b
    2: q = p - c
    3: p = q * d
    4: p = e - p
    5: q = p + q
    

    just equals to

    //SSA
    1: p_1 = a + b
    2: q_1 = p_1 - c
    3: p_2 = q_1 * d
    4: p_3 = e - p_2
    5: q_2 = p_3 + q_1
    

    and if there is a variable used when control flow merges, we introduce a (phi) function.

    image-20210702190353773

    Obviously, compared to 3AC, SSA provides more explicit define-and-use pairs and more directly incorporated flow information, which helps some optimization, data-demanding or other tasks. At meanwhile, SSA can intorduce too many variables and (phi) functions, causing further efficiency porblems.

    Based on the intermediate representation we have, generating a Control Flow Graph (CFG) of the given prgram becomes possible. CFG is a graph stucture consisting vertexes abstracted from a complete block of codes and edges abstracted from the goto instructions (the default edge constructed as sequential order is ignored). For example we have:

    image-20210702192032983

    (The black edges here are ignored in an actual CFG)

    Each vertex is a set individual 3-address instruction called Basic Block (BB). A BB should be the maximal sequences of consecutice 3AC instructions with properties that it can only be entered at the beginning and exited at the end. We can easily come up with an algorithm divide a whole passage of 3AC into BBs: claim each line a goto sentence starts from and end to as divisions, we naturally got the required BBs.

    And we add an edge from block A to B in a CFG if and only if when there is a jump from the end of A to the beginning of B except that B sequentially follows A and the jump from A to B is an unconditional one.

  • 相关阅读:
    bzoj 2733 [HNOI2012]永无乡
    CF550D Regular Bridge
    bzoj 1911 [Apio2010]特别行动队
    CF1137A/1138C Skyscrapers
    CF295C Greg and Friends
    CF1130E Wrong Answer
    bzoj 1029 [JSOI2007]建筑抢修
    iOS 流布局 UICollectionView使用(简单使用)
    Swift 学习笔记 (解决Swift闭包中循环引用的三种方法)
    Swift 学习笔记(面向协议编程)
  • 原文地址:https://www.cnblogs.com/Shimarin/p/14964800.html
Copyright © 2011-2022 走看看