zoukankan      html  css  js  c++  java
  • NJU Static Program Analysis 02: Intermediate Representation

    NJU Static Program Analysis 02:
    Intermediate Representation

    Abstraction

    1. The relation between compilers and static analysis
    2. Understand 3AC and its common forms (in IR jimple)
    3. How to build basic blocks on top of IR
    4. How to construct control flow graphs on top of BBs?

    Notes

    Intermediate Representation (IR) is a kind of code generated by the compiler during Translation and is usually considered as the basis of static analysis. It is somehow like a not-that-high-level programming language without any grammar sugar, or an assembly language with more details like typing system. The features of IR includes:

    • low-level and closed to machine code
    • usually independent to a specific programming language
    • compact and uniform
    • contains control flow information

    One of the most popular IR is Three-Address Code (3AC), each of its sentance contains at most three addresses (Address can be Name, Constant, or Compiler-generated temporary variables).

    For example the code:

    do {
        i = i + 1;
    } while (a[i] < v);
    

    can be translated to

    1: i = i + 1
    2: t1 = a[ i ]
    3: if t1 < v goto 1
    

    We have these common forms of 3AC like

    x = y bop z, x = uop y, x = y, goto L, if x goto L, if x rop y goto L,

    where:

    x, y, z refer to addresses,
    bop is a binary arithmetic operation,
    uop is a unary operation,
    rop is a relational operation,
    L is a label to represent a program location,

    Another popular IR is Statuc Single Assignment (SSA). It distinct variables by not only its definition or scope, but also the sentence it belongs to. For instance,

    // 3AC
    1: p = a + b
    2: q = p - c
    3: p = q * d
    4: p = e - p
    5: q = p + q
    

    just equals to

    //SSA
    1: p_1 = a + b
    2: q_1 = p_1 - c
    3: p_2 = q_1 * d
    4: p_3 = e - p_2
    5: q_2 = p_3 + q_1
    

    and if there is a variable used when control flow merges, we introduce a (phi) function.

    image-20210702190353773

    Obviously, compared to 3AC, SSA provides more explicit define-and-use pairs and more directly incorporated flow information, which helps some optimization, data-demanding or other tasks. At meanwhile, SSA can intorduce too many variables and (phi) functions, causing further efficiency porblems.

    Based on the intermediate representation we have, generating a Control Flow Graph (CFG) of the given prgram becomes possible. CFG is a graph stucture consisting vertexes abstracted from a complete block of codes and edges abstracted from the goto instructions (the default edge constructed as sequential order is ignored). For example we have:

    image-20210702192032983

    (The black edges here are ignored in an actual CFG)

    Each vertex is a set individual 3-address instruction called Basic Block (BB). A BB should be the maximal sequences of consecutice 3AC instructions with properties that it can only be entered at the beginning and exited at the end. We can easily come up with an algorithm divide a whole passage of 3AC into BBs: claim each line a goto sentence starts from and end to as divisions, we naturally got the required BBs.

    And we add an edge from block A to B in a CFG if and only if when there is a jump from the end of A to the beginning of B except that B sequentially follows A and the jump from A to B is an unconditional one.

  • 相关阅读:
    Mac OS 下包管理器 homebrew的安装
    草根程序员八年百万年薪之路
    div隐藏滚动条,仍可滚动
    感觉身体被掏空by彩虹室内合唱团
    添加bash命令
    mysql性能优化
    PHP变量存储结构
    轻量级MVC框架(自行开发)
    一致性hash算法之php实现
    redis安装
  • 原文地址:https://www.cnblogs.com/Shimarin/p/14964800.html
Copyright © 2011-2022 走看看