Principles of Program Analysis

by

Flemming Nielson, Hanne Riis Nielson and Chris Hankin.

The book gives an introduction to the principles of program analysis. It concentrates on the four main approaches (data flow analysis, control flow analysis, abstract interpretation, and type and effect systems) indicating their relationships and differences; this in itself spans a variety of programming language features. Our coverage of algorithmic techniques will be substantial, presenting a uniform approach to the fast algorithms for equation and constraint solving.

The level of exposition assumes some background in programming languages (including semantics and compilers). The presentation is aimed at graduate students but keeping in mind that it should also be useful for the beginning postgraduate student and that the elementary parts should be readable by advanced undergraduate students. Additionally, researchers and practitioners in the field may find the book of interest for its broad coverage of all of the main approaches, highlighting the many similarities that exist between them.

The book contains six chapters, three appendices, indices of terminology and of notation, and a bibliography. Each chapter contains a number of exercises and the book contains several mini-projects allowing the student to explore parts of the development in greater detail.

Chapter 1: Introduction

The aim of this chapter is to illustrate the fundamental nature of the four approaches by presenting illuminating examples of analyses for imperative as well as functional programs. This will not only highlight some of the issues to be dealt with in more detail later but will also establish the frame of reference for subsequent discussions; in particular, it will allow the reader to appreciate the many similarities between the four approaches (and in contrast with the impression one might get from reading the literature). Next we present the first very simple algorithm for calculating the results of the analyses; it can be seen as the basis of several of the algorithms to be developed in subsequent chapters. Finally, we illustrate how to use program analysis information for program transformations.

Chapter 2: Data Flow Analysis

We begin with a treatment of the classical intra-procedural analyses of live variables, available expressions, reaching definitions, very busy expressions, ud-chaining and du-chaining and thus explain the distinction between forward and backward analyses. Next we provide a structural operational semantics for the imperative language and prove the semantic correctness of of the live variables analysis; a mini-project shows how to prove the semantic correctness of the reaching definitions analysis. We then present the generalisation offered by the monotone frameworks of Kam and Ullman; we formulate constant propagation analysis as an example of an analysis that is not also in the distributive framework. We then present a general worklist algorithm for computing the MFP solution of monotone frameworks and we discuss that the MFP solution may differ from the MOP solution.

We then move on to some more complex aspects of data flow analysis. We deal at some length with the various techniques for interprocedural analysis covering such techniques as call strings (a la Sharir and Pnueli) and assumption sets (a la Landi and Ryder). The final section develops an analysis that determines whether or not the graph pointed to specialises to being non-cyclic or even a tree, and it ends by presenting a more ambitious shape analysis.

Chapter 3: Control Flow Analysis

We develop a control flow analysis (also known as a closure analysis) for a simple untyped functional language; this can be regarded as an outgrowth of the interprocedural data flow analysis in that now we tackle the problem of ``dynamic dispatch''. It takes the form of an abstract coinductive specification of what it means for the results of an 0-CFA (or monovariant) analysis to be acceptable. Next we provide a structural operational semantics for the language and prove that the specification is semantically sound; we also show that the specification always admits a best solution by establishing a general result about the set of solutions being a Moore family. We then show how to generate the constraints in an explicit form and how to use a graph-based algorithm for computing the least solution in cubic time; this relates to the literature on set constraints.

We complete the chapter by dealing with some more complex aspects. One is the addition of a data flow component (e.g. in the form of a monotone framework) thereby showing how to incorporate the ideas from intraprocedural analysis. Another is the incorporation of context in the manner of call strings (a variant of k-CFA called uniform-k-CFA) thereby establishing the link to interprocedural analysis.

Chapter 4: Abstract Interpretation

To illustrate the general nature of abstract interpretation our coverage will not be focused on any particular language or semantics. To this end we abstractly formulate correctness as the preservation of a certain correctness relation during computation. We then deal with the important techniques of widening and narrowing for ``speeding-up'' the computation of the least fixed point (actually ``over-shooting'' it). Next we deal with the classical technique of Galois connections (or adjunctions) for replacing computations over one universe with computations over a coarser universe. We go into the details of how to construct Galois connections in a systematic manner, covering such techniques as independent attribute methods versus relational methods, direct products versus tensor products, and the reduced versions of these constructions. Finally, we show how to ``induce'' the actual computations to be carried out over the coarser domain; this includes a treatment of how to ``induce'' widenings and we show how to apply these ideas in the case of the monotone frameworks of Chapter 2. Through a number of examples we show how to develop Webber's approximation for checking indices into arrays (presented at PLDI'97).

Chapter 5: Type and Effect Systems

The chapter begins by formulating a type and effect system for a variant of the control flow analysis considered in Chapter 3; this also serves to introduce sub-effecting. Next we prove that the information specified is semantically correct with respect to a natural semantics (or big-step structural operational semantics); in the manner of Chapter 3 we also show that there always is a best analysis. Then we show how to efficiently implement the analysis by generating constraints; this involves proving the syntactic soundness and completeness of a version of algorithm W (used for type inference in the lambda-calculus).

To illustrate the variety of type and effect systems we conclude by formulating several other type and effect systems for the simple functional language. First we show how to specify a side effect analysis and this involves a discussion of sub-typing. Then we formulate an escape analysis and this involves a discussion of polymorphism. Next we cover region analysis and briefly discuss polymorphic recursion. Finally we develop a simple pointer analysis that is much coarser (and much faster) than those considered in Chapter 2.

Chapter 6: Algorithms

We begin by considering the solution of simple constraint systems (either equations, as in data flow analysis, or simple inequations). We present a generalised version of the worklist algorithm used in Chapters 2 and 3. We show that for some analyses a round-robin algorithm can be more efficient. For each algorithm, we present correctness results and discuss complexity.

Appendix A: Partially Ordered Sets

This appendix presents the key definitions and results for partially ordered sets that will be used in the book. This includes Tarski's result saying (among other things) that a monotone function on a complete lattice has a least as well as a greatest fixed point.

Appendix B: Induction and Coinduction

This appendix surveys the key induction principles to be used in the book. It also introduces the proof principle of coinduction (based on Tarski's result) that is used for the development of the control flow analysis in Chapter 3.

Appendix C: Graphs and Regular Expressions

This appendix presents those concepts for graphs and regular expressions that form the basis for the algorithms that are presented in Chapter 6. This includes such concepts as depth first spanning trees, reverse postorder and reducibility by intervals.