NJU Static Program Analysis 07: Interprocedural Analysis
Abstract
- How to build call graph via class hierarchy analysis
- Concept of interprocedural control graph
- Concept of interprocedural data-flow analysis
- Interprocedural constant propagation
Notes
In this lecture, we will focus on the ambiguity caused by the method calls of intraprocedural. For example, when doing intraprocedural constant propagation analysis, every variable concerned in any method calls is viewed as ( m NAC), bringing about much imprecision. To fix this, we start to introduce the interprocedural analysis based on the program call graph.
Call Graph is a formal representation of the calling relation of methods in a program that becomes the basis for interprocedural analysis. There are various ways to construct a call graph for OOPLs(Object-Oriented Programming Languages), each with different efficiency and precision. In this lecture, we will introduce one of the most efficient(while most imprecise) one, the Class Hierarchy Analysis(CHA). And in next lecture, one of the most precise(while most inefficient) analysis, the Pointer Analysis(k-CFA) would be introduced.
Before start, let's review the varieties of method calls in Java:
Virtual call is designed for Java run-time polymorphism. Conceivably, the polymorphic virtual calls dispatching will become the most problematic and complex one for our analysis.
The method dispatching for virtual calls is resolved based on the type of receiver object and the method signature at the call side. For example,
class C {
T foo(P p, Q q, R r) {
// do something
}
}
the method would be identified as <C: T foo(P,Q,R)>
, indicating its class type, method name and descriptor(refers to the type of the parameters and the return value). We sometimes say C.foo(...)
for that if there is no ambiguity. Here C
is the type of receiver object, and foo(...)
is the method signature at the call site.
So for a virtual call (c.m), how can we simulate the run-time dispatching procedure? A function ({ m Dispatch}(c, m)) and be defined as:
Solved the dispatching problem, we can then turn to the Class Hierarchy Analysis(CHA). CHA uses the program class hierarchy structure to resolve a virtual call based on its declared type of receiver variable of the call site, in order to determine what class the object can point to.
To begin with, we specially use CHA to find the possible target methods of a call site, which is common in modern IDEs. For
A obj = ...
obj.foo();
We assume obj
may point to a object of A
and any of the subclasses of A
, which is a typical may analysis characteristic. In this context, for a call site (cs), we can naturally design a function ({
m Resolve}(cs)) by CHA like:
Resolve(cs)
T = {}
m = method signature at cs
if cs is a static call then
T = { m }
if cs is special call then
cm = class type of m
T = { Dispatch(cm, m) }
if cs is a virtual call then
c = declared type of reveiver variable at cs
foreach ci that is a subclass of c or c itself do
add Dispatch(ci, m) to T
return T
to recursively find the possible target methods.
Besides, we can use CHA to build a call graph of a program. To conclude, we simply start from the program entry and then bfs over the program hierarchy structure, marking all the methods available with the calling edges.
Here is a illustration for the call graph we built.
It seems that the algorithms in our curriculum are generally naive to some extend.
After all eventually, we can start to implement a interprocedural control-flow graph(ICFG) for further analysis. If a CFG represents structure of a single method, then a ICFG will represent the structure of the whole program by adding edges between CFGs.
Concretely, for every node concerning a method call, we add a call edge and a return edge between the statement and the corresponding method. Note that the edge here also has a edge transfer function for interprocedural information transferring. Using constant propagation analysis as an example we have:
The edge in a CFG added with interprocedural calling edges is named call-to-return edge. It allows the analysis to propagate the concerned local data flow on a ICFG, without which we would inefficiently had to propagate everything in local data flow across other methods.