zoukankan      html  css  js  c++  java
  • NJU Static Program Analysis 02: Intermediate Representation

    NJU Static Program Analysis 02:
    Intermediate Representation

    Abstraction

    1. The relation between compilers and static analysis
    2. Understand 3AC and its common forms (in IR jimple)
    3. How to build basic blocks on top of IR
    4. How to construct control flow graphs on top of BBs?

    Notes

    Intermediate Representation (IR) is a kind of code generated by the compiler during Translation and is usually considered as the basis of static analysis. It is somehow like a not-that-high-level programming language without any grammar sugar, or an assembly language with more details like typing system. The features of IR includes:

    • low-level and closed to machine code
    • usually independent to a specific programming language
    • compact and uniform
    • contains control flow information

    One of the most popular IR is Three-Address Code (3AC), each of its sentance contains at most three addresses (Address can be Name, Constant, or Compiler-generated temporary variables).

    For example the code:

    do {
        i = i + 1;
    } while (a[i] < v);
    

    can be translated to

    1: i = i + 1
    2: t1 = a[ i ]
    3: if t1 < v goto 1
    

    We have these common forms of 3AC like

    x = y bop z, x = uop y, x = y, goto L, if x goto L, if x rop y goto L,

    where:

    x, y, z refer to addresses,
    bop is a binary arithmetic operation,
    uop is a unary operation,
    rop is a relational operation,
    L is a label to represent a program location,

    Another popular IR is Statuc Single Assignment (SSA). It distinct variables by not only its definition or scope, but also the sentence it belongs to. For instance,

    // 3AC
    1: p = a + b
    2: q = p - c
    3: p = q * d
    4: p = e - p
    5: q = p + q
    

    just equals to

    //SSA
    1: p_1 = a + b
    2: q_1 = p_1 - c
    3: p_2 = q_1 * d
    4: p_3 = e - p_2
    5: q_2 = p_3 + q_1
    

    and if there is a variable used when control flow merges, we introduce a (phi) function.

    image-20210702190353773

    Obviously, compared to 3AC, SSA provides more explicit define-and-use pairs and more directly incorporated flow information, which helps some optimization, data-demanding or other tasks. At meanwhile, SSA can intorduce too many variables and (phi) functions, causing further efficiency porblems.

    Based on the intermediate representation we have, generating a Control Flow Graph (CFG) of the given prgram becomes possible. CFG is a graph stucture consisting vertexes abstracted from a complete block of codes and edges abstracted from the goto instructions (the default edge constructed as sequential order is ignored). For example we have:

    image-20210702192032983

    (The black edges here are ignored in an actual CFG)

    Each vertex is a set individual 3-address instruction called Basic Block (BB). A BB should be the maximal sequences of consecutice 3AC instructions with properties that it can only be entered at the beginning and exited at the end. We can easily come up with an algorithm divide a whole passage of 3AC into BBs: claim each line a goto sentence starts from and end to as divisions, we naturally got the required BBs.

    And we add an edge from block A to B in a CFG if and only if when there is a jump from the end of A to the beginning of B except that B sequentially follows A and the jump from A to B is an unconditional one.

  • 相关阅读:
    vue项目总结,所用到的技术点
    豆瓣电影个人项目总结
    博学谷项目总结,个人总结经验
    JavaScript数组的常用方法
    移动端base.css的内容,
    normalize.css可以对css初始化,不同浏览器可以统一初始样式
    jsp第十次作业
    jsp第九次作业
    jsp第八次作业
    jsp第七次作业
  • 原文地址:https://www.cnblogs.com/Shimarin/p/14964800.html
Copyright © 2011-2022 走看看