618 lines
		
	
	
		
			28 KiB
		
	
	
	
		
			HTML
		
	
	
	
			
		
		
	
	
			618 lines
		
	
	
		
			28 KiB
		
	
	
	
		
			HTML
		
	
	
	
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
 | 
						|
          "http://www.w3.org/TR/html4/strict.dtd">
 | 
						|
<html>
 | 
						|
<head>
 | 
						|
  <title>Checker Developer Manual</title>
 | 
						|
  <link type="text/css" rel="stylesheet" href="menu.css">
 | 
						|
  <link type="text/css" rel="stylesheet" href="content.css">
 | 
						|
  <script type="text/javascript" src="scripts/menu.js"></script>
 | 
						|
</head>
 | 
						|
<body>
 | 
						|
 | 
						|
<div id="page">
 | 
						|
<!--#include virtual="menu.html.incl"-->
 | 
						|
 | 
						|
<div id="content">
 | 
						|
 | 
						|
<h3 style="color:red">This Page Is Under Construction</h3>
 | 
						|
 | 
						|
<h1>Checker Developer Manual</h1>
 | 
						|
 | 
						|
<p>The static analyzer engine performs path-sensitive exploration of the program and 
 | 
						|
relies on a set of checkers to implement the logic for detecting and 
 | 
						|
constructing specific bug reports. Anyone who is interested in implementing their own 
 | 
						|
checker, should check out the Building a Checker in 24 Hours talk 
 | 
						|
(<a href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
 | 
						|
 <a href="http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4">video</a>) 
 | 
						|
and refer to this page for additional information on writing a checker. The static analyzer is a 
 | 
						|
part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a> 
 | 
						|
and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a> 
 | 
						|
for developer guidelines and send your questions and proposals to 
 | 
						|
<a href=http://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev mailing list</a>. 
 | 
						|
</p>
 | 
						|
 | 
						|
    <ul>
 | 
						|
      <li><a href="#start">Getting Started</a></li>
 | 
						|
      <li><a href="#analyzer">Static Analyzer Overview</a>
 | 
						|
      <ul>
 | 
						|
        <li><a href="#interaction">Interaction with Checkers</a></li>
 | 
						|
        <li><a href="#values">Representing Values</a></li>
 | 
						|
      </ul></li>
 | 
						|
      <li><a href="#idea">Idea for a Checker</a></li>
 | 
						|
      <li><a href="#registration">Checker Registration</a></li>
 | 
						|
      <li><a href="#events_callbacks">Events, Callbacks, and Checker Class Structure</a></li>
 | 
						|
      <li><a href="#extendingstates">Custom Program States</a></li>
 | 
						|
      <li><a href="#bugs">Bug Reports</a></li>
 | 
						|
      <li><a href="#ast">AST Visitors</a></li>
 | 
						|
      <li><a href="#testing">Testing</a></li>
 | 
						|
      <li><a href="#commands">Useful Commands/Debugging Hints</a></li>
 | 
						|
      <li><a href="#additioninformation">Additional Sources of Information</a></li>
 | 
						|
      <li><a href="#links">Useful Links</a></li>
 | 
						|
    </ul>
 | 
						|
 | 
						|
<h2 id=start>Getting Started</h2>
 | 
						|
  <ul>
 | 
						|
    <li>To check out the source code and build the project, follow steps 1-4 of 
 | 
						|
    the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a> 
 | 
						|
  page.</li>
 | 
						|
 | 
						|
    <li>The analyzer source code is located under the Clang source tree:
 | 
						|
    <br><tt>
 | 
						|
    $ <b>cd llvm/tools/clang</b>
 | 
						|
    </tt>
 | 
						|
    <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
 | 
						|
     <tt>test/Analysis</tt>.</li>
 | 
						|
 | 
						|
    <li>The analyzer regression tests can be executed from the Clang's build 
 | 
						|
    directory:
 | 
						|
    <br><tt>
 | 
						|
    $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
 | 
						|
    </tt></li>
 | 
						|
    
 | 
						|
    <li>Analyze a file with the specified checker:
 | 
						|
    <br><tt>
 | 
						|
    $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
 | 
						|
    </tt></li>
 | 
						|
 | 
						|
    <li>List the available checkers:
 | 
						|
    <br><tt>
 | 
						|
    $ <b>clang -cc1 -analyzer-checker-help</b>
 | 
						|
    </tt></li>
 | 
						|
 | 
						|
    <li>See the analyzer help for different output formats, fine tuning, and 
 | 
						|
    debug options:
 | 
						|
    <br><tt>
 | 
						|
    $ <b>clang -cc1 -help | grep "analyzer"</b>
 | 
						|
    </tt></li>
 | 
						|
 | 
						|
  </ul>
 | 
						|
 
 | 
						|
<h2 id=analyzer>Static Analyzer Overview</h2>
 | 
						|
  The analyzer core performs symbolic execution of the given program. All the 
 | 
						|
  input values are represented with symbolic values; further, the engine deduces 
 | 
						|
  the values of all the expressions in the program based on the input symbols  
 | 
						|
  and the path. The execution is path sensitive and every possible path through 
 | 
						|
  the program is explored. The explored execution traces are represented with 
 | 
						|
  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object.
 | 
						|
  Each node of the graph is 
 | 
						|
  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>, 
 | 
						|
  which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
 | 
						|
  <p>
 | 
						|
  <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a> 
 | 
						|
  represents the corresponding location in the program (or the CFG). 
 | 
						|
  <tt>ProgramPoint</tt> is also used to record additional information on 
 | 
						|
  when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt> 
 | 
						|
  kind means that the state is the result of purging dead symbols - the 
 | 
						|
  analyzer's equivalent of garbage collection. 
 | 
						|
  <p>
 | 
						|
  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a> 
 | 
						|
  represents abstract state of the program. It consists of:
 | 
						|
  <ul>
 | 
						|
    <li><tt>Environment</tt> - a mapping from source code expressions to symbolic 
 | 
						|
    values
 | 
						|
    <li><tt>Store</tt> - a mapping from memory locations to symbolic values
 | 
						|
    <li><tt>GenericDataMap</tt> - constraints on symbolic values
 | 
						|
  </ul>
 | 
						|
  
 | 
						|
  <h3 id=interaction>Interaction with Checkers</h3>
 | 
						|
  Checkers are not merely passive receivers of the analyzer core changes - they 
 | 
						|
  actively participate in the <tt>ProgramState</tt> construction through the
 | 
						|
  <tt>GenericDataMap</tt> which can be used to store the checker-defined part 
 | 
						|
  of the state. Each time the analyzer engine explores a new statement, it 
 | 
						|
  notifies each checker registered to listen for that statement, giving it an 
 | 
						|
  opportunity to either report a bug or modify the state. (As a rule of thumb, 
 | 
						|
  the checker itself should be stateless.) The checkers are called one after another 
 | 
						|
  in the predefined order; thus, calling all the checkers adds a chain to the 
 | 
						|
  <tt>ExplodedGraph</tt>. 
 | 
						|
  
 | 
						|
  <h3 id=values>Representing Values</h3>
 | 
						|
  During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a> 
 | 
						|
  objects are used to represent the semantic evaluation of expressions. 
 | 
						|
  They can represent things like concrete 
 | 
						|
  integers, symbolic values, or memory locations (which are memory regions). 
 | 
						|
  They are a discriminated union of "values", symbolic and otherwise. 
 | 
						|
  If a value isn't symbolic, usually that means there is no symbolic 
 | 
						|
  information to track. For example, if the value was an integer, such as 
 | 
						|
  <tt>42</tt>, it would be a <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>, 
 | 
						|
  and the checker doesn't usually need to track any state with the concrete 
 | 
						|
  number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be 
 | 
						|
  a symbolic value. This happens when the analyzer cannot reason about something 
 | 
						|
  (yet). An example is floating point numbers. In such cases, the 
 | 
						|
  <tt>SVal</tt> will evaluate to <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal</a>.
 | 
						|
  This represents a case that is outside the realm of the analyzer's reasoning 
 | 
						|
  capabilities. <tt>SVals</tt> are value objects and their values can be viewed 
 | 
						|
  using the <tt>.dump()</tt> method. Often they wrap persistent objects such as 
 | 
						|
  symbols or regions. 
 | 
						|
  <p>
 | 
						|
  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol) 
 | 
						|
  is meant to represent abstract, but named, symbolic value. Symbols represent 
 | 
						|
  an actual (immutable) value. We might not know what its specific value is, but 
 | 
						|
  we can associate constraints with that value as we analyze a path. For 
 | 
						|
  example, we might record that the value of a symbol is greater than 
 | 
						|
  <tt>0</tt>, etc.
 | 
						|
  <p>
 | 
						|
 | 
						|
  <p>
 | 
						|
  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.  
 | 
						|
  It is used to provide a lexicon of how to describe abstract memory. Regions can 
 | 
						|
  layer on top of other regions, providing a layered approach to representing memory. 
 | 
						|
  For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>, 
 | 
						|
  but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could 
 | 
						|
  be used to represent the memory associated with a specific field of that object.
 | 
						|
  So how do we represent symbolic memory regions? That's what 
 | 
						|
  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a> 
 | 
						|
  is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the 
 | 
						|
  symbol is unique and has a unique name; that symbol names the region.
 | 
						|
  
 | 
						|
  <P>
 | 
						|
  Let's see how the analyzer processes the expressions in the following example:
 | 
						|
  <p>
 | 
						|
  <pre class="code_example">
 | 
						|
  int foo(int x) {
 | 
						|
     int y = x * 2;
 | 
						|
     int z = x;
 | 
						|
     ...
 | 
						|
  }
 | 
						|
  </pre>
 | 
						|
  <p>
 | 
						|
Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated, 
 | 
						|
we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in 
 | 
						|
this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>. 
 | 
						|
Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>, 
 | 
						|
which references the value <b>currently bound</b> to <tt>x</tt>. That value is 
 | 
						|
symbolic; it's whatever <tt>x</tt> was bound to at the start of the function. 
 | 
						|
Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>, 
 | 
						|
and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When 
 | 
						|
we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions, 
 | 
						|
and create a new <tt>SVal</tt> that represents their multiplication (which in 
 | 
						|
this case is a new symbolic expression, which we might call <tt>$1</tt>). When we 
 | 
						|
evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>), 
 | 
						|
and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>) 
 | 
						|
to the <tt>MemRegion</tt> in the symbolic store.
 | 
						|
<br>
 | 
						|
The second line is similar. When we evaluate <tt>x</tt> again, we do the same 
 | 
						|
dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt> 
 | 
						|
might reference the same underlying values.
 | 
						|
 | 
						|
<p>
 | 
						|
To summarize, MemRegions are unique names for blocks of memory. Symbols are 
 | 
						|
unique names for abstract symbolic values. Some MemRegions represents abstract 
 | 
						|
symbolic chunks of memory, and thus are also based on symbols. SVals are just 
 | 
						|
references to values, and can reference either MemRegions, Symbols, or concrete 
 | 
						|
values (e.g., the number 1).
 | 
						|
 | 
						|
  <!-- 
 | 
						|
  TODO: Add a picture.
 | 
						|
  <br>
 | 
						|
  Symbols<br>
 | 
						|
  FunctionalObjects are used throughout.  
 | 
						|
  -->
 | 
						|
 | 
						|
<h2 id=idea>Idea for a Checker</h2>
 | 
						|
  Here are several questions which you should consider when evaluating your 
 | 
						|
  checker idea:
 | 
						|
  <ul>
 | 
						|
    <li>Can the check be effectively implemented without path-sensitive 
 | 
						|
    analysis? See <a href="#ast">AST Visitors</a>.</li>
 | 
						|
    
 | 
						|
    <li>How high the false positive rate is going to be? Looking at the occurrences 
 | 
						|
    of the issue you want to write a checker for in the existing code bases might 
 | 
						|
    give you some ideas. </li>
 | 
						|
    
 | 
						|
    <li>How the current limitations of the analysis will effect the false alarm 
 | 
						|
    rate? Currently, the analyzer only reasons about one procedure at a time (no 
 | 
						|
    inter-procedural analysis). Also, it uses a simple range tracking based 
 | 
						|
    solver to model symbolic execution.</li>
 | 
						|
    
 | 
						|
    <li>Consult the <a
 | 
						|
    href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=REOPENED&version=trunk&component=Static%20Analyzer&product=clang">Bugzilla database</a> 
 | 
						|
    to get some ideas for new checkers and consider starting with improving/fixing  
 | 
						|
    bugs in the existing checkers.</li>
 | 
						|
  </ul>
 | 
						|
 | 
						|
<p>Once an idea for a checker has been chosen, there are two key decisions that
 | 
						|
need to be made:
 | 
						|
  <ul>
 | 
						|
    <li> Which events the checker should be tracking. This is discussed in more
 | 
						|
    detail in the section <a href="#events_callbacks">Events, Callbacks, and
 | 
						|
    Checker Class Structure</a>.
 | 
						|
    <li> What checker-specific data needs to be stored as part of the program
 | 
						|
    state (if any). This should be minimized as much as possible. More detail about
 | 
						|
    implementing custom program state is given in section <a
 | 
						|
    href="#extendingstates">Custom Program States</a>.
 | 
						|
  </ul>
 | 
						|
 | 
						|
 | 
						|
<h2 id=registration>Checker Registration</h2>
 | 
						|
  All checker implementation files are located in
 | 
						|
  <tt>clang/lib/StaticAnalyzer/Checkers</tt> folder. The steps below describe
 | 
						|
  how the checker <tt>SimpleStreamChecker</tt>, which checks for misuses of 
 | 
						|
  stream APIs, was registered with the analyzer.
 | 
						|
  Similar steps should be followed for a new checker.
 | 
						|
<ol>
 | 
						|
  <li>A new checker implementation file, <tt>SimpleStreamChecker.cpp</tt>, was
 | 
						|
  created in the directory <tt>lib/StaticAnalyzer/Checkers</tt>.
 | 
						|
  <li>The following registration code was added to the implementation file:
 | 
						|
<pre class="code_example">
 | 
						|
void ento::registerSimpleStreamChecker(CheckerManager &mgr) {
 | 
						|
  mgr.registerChecker<SimpleStreamChecker>();
 | 
						|
}
 | 
						|
</pre>
 | 
						|
<li>A package was selected for the checker and the checker was defined in the
 | 
						|
table of checkers at <tt>lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Since all
 | 
						|
checkers should first be developed as "alpha", and the SimpleStreamChecker
 | 
						|
performs UNIX API checks, the correct package is "alpha.unix", and the following
 | 
						|
was added to the corresponding <tt>UnixAlpha</tt> section of <tt>Checkers.td</tt>:
 | 
						|
<pre class="code_example">
 | 
						|
let ParentPackage = UnixAlpha in {
 | 
						|
...
 | 
						|
def SimpleStreamChecker : Checker<"SimpleStream">,
 | 
						|
  HelpText<"Check for misuses of stream APIs">,
 | 
						|
  DescFile<"SimpleStreamChecker.cpp">;
 | 
						|
...
 | 
						|
} // end "alpha.unix"
 | 
						|
</pre>
 | 
						|
 | 
						|
<li>The source code file was made visible to CMake by adding it to
 | 
						|
<tt>lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
 | 
						|
 | 
						|
</ol>
 | 
						|
 | 
						|
After adding a new checker to the analyzer, one can verify that the new checker
 | 
						|
was successfully added by seeing if it appears in the list of available checkers:
 | 
						|
<br> <tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
 | 
						|
 | 
						|
<h2 id=events_callbacks>Events, Callbacks, and Checker Class Structure</h2>
 | 
						|
 | 
						|
<p> All checkers inherit from the <tt><a
 | 
						|
href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1Checker.html">
 | 
						|
Checker</a></tt> template class; the template parameter(s) describe the type of
 | 
						|
events that the checker is interested in processing. The various types of events
 | 
						|
that are available are described in the file <a
 | 
						|
href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
 | 
						|
CheckerDocumentation.cpp</a>
 | 
						|
 | 
						|
<p> For each event type requested, a corresponding callback function must be
 | 
						|
defined in the checker class (<a
 | 
						|
href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
 | 
						|
CheckerDocumentation.cpp</a> shows the
 | 
						|
correct function name and signature for each event type).
 | 
						|
 | 
						|
<p> As an example, consider <tt>SimpleStreamChecker</tt>. This checker needs to
 | 
						|
take action at the following times:
 | 
						|
 | 
						|
<ul>
 | 
						|
<li>Before making a call to a function, check if the function is <tt>fclose</tt>.
 | 
						|
If so, check the parameter being passed.
 | 
						|
<li>After making a function call, check if the function is <tt>fopen</tt>. If
 | 
						|
so, process the return value.
 | 
						|
<li>When values go out of scope, check whether they are still-open file
 | 
						|
descriptors, and report a bug if so. In addition, remove any information about
 | 
						|
them from the program state in order to keep the state as small as possible.
 | 
						|
<li>When file pointers "escape" (are used in a way that the analyzer can no longer
 | 
						|
track them), mark them as such. This prevents false positives in the cases where
 | 
						|
the analyzer cannot be sure whether the file was closed or not.
 | 
						|
</ul>
 | 
						|
 | 
						|
<p>These events that will be used for each of these actions are, respectively, <a
 | 
						|
href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PreCall.html">PreCall</a>,
 | 
						|
<a
 | 
						|
href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PostCall.html">PostCall</a>,
 | 
						|
<a
 | 
						|
href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1DeadSymbols.html">DeadSymbols</a>,
 | 
						|
and <a
 | 
						|
href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PointerEscape.html">PointerEscape</a>.
 | 
						|
The high-level structure of the checker's class is thus:
 | 
						|
 | 
						|
<pre class="code_example">
 | 
						|
class SimpleStreamChecker : public Checker<check::PreCall,
 | 
						|
                                           check::PostCall,
 | 
						|
                                           check::DeadSymbols,
 | 
						|
                                           check::PointerEscape> {
 | 
						|
public:
 | 
						|
 | 
						|
  void checkPreCall(const CallEvent &Call, CheckerContext &C) const;
 | 
						|
 | 
						|
  void checkPostCall(const CallEvent &Call, CheckerContext &C) const;
 | 
						|
 | 
						|
  void checkDeadSymbols(SymbolReaper &SR, CheckerContext &C) const;
 | 
						|
 | 
						|
  ProgramStateRef checkPointerEscape(ProgramStateRef State,
 | 
						|
                                     const InvalidatedSymbols &Escaped,
 | 
						|
                                     const CallEvent *Call,
 | 
						|
                                     PointerEscapeKind Kind) const;
 | 
						|
};
 | 
						|
</pre>
 | 
						|
 | 
						|
<h2 id=extendingstates>Custom Program States</h2>
 | 
						|
 | 
						|
<p> Checkers often need to keep track of information specific to the checks they
 | 
						|
perform. However, since checkers have no guarantee about the order in which the
 | 
						|
program will be explored, or even that all possible paths will be explored, this
 | 
						|
state information cannot be kept within individual checkers. Therefore, if
 | 
						|
checkers need to store custom information, they need to add new categories of
 | 
						|
data to the <tt>ProgramState</tt>. The preferred way to do so is to use one of
 | 
						|
several macros designed for this purpose. They are:
 | 
						|
 | 
						|
<ul>
 | 
						|
<li><a
 | 
						|
href="http://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#ae4cddb54383cd702a045d7c61b009147">REGISTER_TRAIT_WITH_PROGRAMSTATE</a>:
 | 
						|
Used when the state information is a single value. The methods available for
 | 
						|
state types declared with this macro are <tt>get</tt>, <tt>set</tt>, and
 | 
						|
<tt>remove</tt>.
 | 
						|
<li><a
 | 
						|
href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#aa27656fa0ce65b0d9ba12eb3c02e8be9">REGISTER_LIST_WITH_PROGRAMSTATE</a>:
 | 
						|
Used when the state information is a list of values. The methods available for
 | 
						|
state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
 | 
						|
<tt>remove</tt>, and <tt>contains</tt>.
 | 
						|
<li><a
 | 
						|
href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#ad90f9387b94b344eaaf499afec05f4d1">REGISTER_SET_WITH_PROGRAMSTATE</a>:
 | 
						|
Used when the state information is a set of values. The methods available for
 | 
						|
state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
 | 
						|
<tt>remove</tt>, and <tt>contains</tt>.
 | 
						|
<li><a
 | 
						|
href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#a6d1893bb8c18543337b6c363c1319fcf">REGISTER_MAP_WITH_PROGRAMSTATE</a>:
 | 
						|
Used when the state information is a map from a key to a value. The methods
 | 
						|
available for state types declared with this macro are <tt>add</tt>,
 | 
						|
<tt>set</tt>, <tt>get</tt>, <tt>remove</tt>, and <tt>contains</tt>.
 | 
						|
</ul>
 | 
						|
 | 
						|
<p>All of these macros take as parameters the name to be used for the custom
 | 
						|
category of state information and the data type(s) to be used for storage. The
 | 
						|
data type(s) specified will become the parameter type and/or return type of the
 | 
						|
methods that manipulate the new category of state information. Each of these
 | 
						|
methods are templated with the name of the custom data type.
 | 
						|
 | 
						|
<p>For example, a common case is the need to track data associated with a
 | 
						|
symbolic expression; a map type is the most logical way to implement this. The
 | 
						|
key for this map will be a pointer to a symbolic expression
 | 
						|
(<tt>SymbolRef</tt>). If the data type to be associated with the symbolic
 | 
						|
expression is an integer, then the custom category of state information would be
 | 
						|
declared as
 | 
						|
 | 
						|
<pre class="code_example">
 | 
						|
REGISTER_MAP_WITH_PROGRAMSTATE(ExampleDataType, SymbolRef, int)
 | 
						|
</pre>
 | 
						|
 | 
						|
The data would be accessed with the function
 | 
						|
 | 
						|
<pre class="code_example">
 | 
						|
ProgramStateRef state;
 | 
						|
SymbolRef Sym;
 | 
						|
...
 | 
						|
int currentlValue = state->get<ExampleDataType>(Sym);
 | 
						|
</pre>
 | 
						|
 | 
						|
and set with the function
 | 
						|
 | 
						|
<pre class="code_example">
 | 
						|
ProgramStateRef state;
 | 
						|
SymbolRef Sym;
 | 
						|
int newValue;
 | 
						|
...
 | 
						|
ProgramStateRef newState = state->set<ExampleDataType>(Sym, newValue);
 | 
						|
</pre>
 | 
						|
 | 
						|
<p>In addition, the macros define a data type used for storing the data of the
 | 
						|
new data category; the name of this type is the name of the data category with
 | 
						|
"Ty" appended. For <tt>REGISTER_TRAIT_WITH_PROGRAMSTATE</tt>, this will simply
 | 
						|
be passed data type; for the other three macros, this will be a specialized
 | 
						|
version of the <a
 | 
						|
href="http://llvm.org/doxygen/classllvm_1_1ImmutableList.html">llvm::ImmutableList</a>,
 | 
						|
<a
 | 
						|
href="http://llvm.org/doxygen/classllvm_1_1ImmutableSet.html">llvm::ImmutableSet</a>,
 | 
						|
or <a
 | 
						|
href="http://llvm.org/doxygen/classllvm_1_1ImmutableMap.html">llvm::ImmutableMap</a>
 | 
						|
templated class. For the <tt>ExampleDataType</tt> example above, the type
 | 
						|
created would be equivalent to writing the declaration:
 | 
						|
 | 
						|
<pre class="code_example">
 | 
						|
typedef llvm::ImmutableMap<SymbolRef, int> ExampleDataTypeTy;
 | 
						|
</pre>
 | 
						|
 | 
						|
<p>These macros will cover a majority of use cases; however, they still have a
 | 
						|
few limitations. They cannot be used inside namespaces (since they expand to
 | 
						|
contain top-level namespace references), and the data types that they define
 | 
						|
cannot be referenced from more than one file.
 | 
						|
 | 
						|
<p>Note that <tt>ProgramStates</tt> are immutable; instead of modifying an existing
 | 
						|
one, functions that modify the state will return a copy of the previous state
 | 
						|
with the change applied. This updated state must be then provided to the
 | 
						|
analyzer core by calling the <tt>CheckerContext::addTransition</tt> function.
 | 
						|
<h2 id=bugs>Bug Reports</h2>
 | 
						|
 | 
						|
 | 
						|
<p> When a checker detects a mistake in the analyzed code, it needs a way to
 | 
						|
report it to the analyzer core so that it can be displayed. The two classes used
 | 
						|
to construct this report are <tt><a
 | 
						|
href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugType.html">BugType</a></tt>
 | 
						|
and <tt><a
 | 
						|
href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugReport.html">
 | 
						|
BugReport</a></tt>.
 | 
						|
 | 
						|
<p>
 | 
						|
<tt>BugType</tt>, as the name would suggest, represents a type of bug. The
 | 
						|
constructor for <tt>BugType</tt> takes two parameters: The name of the bug
 | 
						|
type, and the name of the category of the bug. These are used (e.g.) in the
 | 
						|
summary page generated by the scan-build tool.
 | 
						|
 | 
						|
<P>
 | 
						|
  The <tt>BugReport</tt> class represents a specific occurrence of a bug. In
 | 
						|
  the most common case, three parameters are used to form a <tt>BugReport</tt>:
 | 
						|
<ol>
 | 
						|
<li>The type of bug, specified as an instance of the <tt>BugType</tt> class.
 | 
						|
<li>A short descriptive string. This is placed at the location of the bug in
 | 
						|
the detailed line-by-line output generated by scan-build.
 | 
						|
<li>The context in which the bug occurred. This includes both the location of
 | 
						|
the bug in the program and the program's state when the location is reached. These are
 | 
						|
both encapsulated in an <tt>ExplodedNode</tt>.
 | 
						|
</ol>
 | 
						|
 | 
						|
<p>In order to obtain the correct <tt>ExplodedNode</tt>, a decision must be made
 | 
						|
as to whether or not analysis can continue along the current path. This decision
 | 
						|
is based on whether the detected bug is one that would prevent the program under
 | 
						|
analysis from continuing. For example, leaking of a resource should not stop
 | 
						|
analysis, as the program can continue to run after the leak. Dereferencing a
 | 
						|
null pointer, on the other hand, should stop analysis, as there is no way for
 | 
						|
the program to meaningfully continue after such an error.
 | 
						|
 | 
						|
<p>If analysis can continue, then the most recent <tt>ExplodedNode</tt> 
 | 
						|
generated by the checker can be passed to the <tt>BugReport</tt> constructor 
 | 
						|
without additional modification. This <tt>ExplodedNode</tt> will be the one 
 | 
						|
returned by the most recent call to <a
 | 
						|
href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition</a>.
 | 
						|
If no transition has been performed during the current callback, the checker should call <a
 | 
						|
href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition()</a> 
 | 
						|
and use the returned node for bug reporting.
 | 
						|
 | 
						|
<p>If analysis can not continue, then the current state should be transitioned
 | 
						|
into a so-called <i>sink node</i>, a node from which no further analysis will be
 | 
						|
performed. This is done by calling the <a
 | 
						|
href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#adeea33a5a2bed190210c4a2bb807a6f0">
 | 
						|
CheckerContext::generateSink</a> function; this function is the same as the
 | 
						|
<tt>addTransition</tt> function, but marks the state as a sink node. Like
 | 
						|
<tt>addTransition</tt>, this returns an <tt>ExplodedNode</tt> with the updated
 | 
						|
state, which can then be passed to the <tt>BugReport</tt> constructor.
 | 
						|
 | 
						|
<p>
 | 
						|
After a <tt>BugReport</tt> is created, it should be passed to the analyzer core 
 | 
						|
by calling <a href = "http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#ae7738af2cbfd1d713edec33d3203dff5">CheckerContext::emitReport</a>.
 | 
						|
 | 
						|
<h2 id=ast>AST Visitors</h2>
 | 
						|
  Some checks might not require path-sensitivity to be effective. Simple AST walk 
 | 
						|
  might be sufficient. If that is the case, consider implementing a Clang 
 | 
						|
  compiler warning. On the other hand, a check might not be acceptable as a compiler 
 | 
						|
  warning; for example, because of a relatively high false positive rate. In this 
 | 
						|
  situation, AST callbacks <tt><b>checkASTDecl</b></tt> and 
 | 
						|
  <tt><b>checkASTCodeBody</b></tt> are your best friends. 
 | 
						|
 | 
						|
<h2 id=testing>Testing</h2>
 | 
						|
  Every patch should be well tested with Clang regression tests. The checker tests 
 | 
						|
  live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests, 
 | 
						|
  execute the following from the <tt>clang</tt> build directory:
 | 
						|
    <pre class="code">
 | 
						|
    $ <b>TESTDIRS=Analysis make test</b>
 | 
						|
    </pre>
 | 
						|
 | 
						|
<h2 id=commands>Useful Commands/Debugging Hints</h2>
 | 
						|
<ul>
 | 
						|
<li>
 | 
						|
While investigating a checker-related issue, instruct the analyzer to only 
 | 
						|
execute a single checker:
 | 
						|
<br><tt>
 | 
						|
$ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
 | 
						|
</tt>
 | 
						|
</li>
 | 
						|
<li>
 | 
						|
To dump AST:
 | 
						|
<br><tt>
 | 
						|
$ <b>clang -cc1 -ast-dump test.c</b>
 | 
						|
</tt>
 | 
						|
</li>
 | 
						|
<li>
 | 
						|
To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers:
 | 
						|
<br><tt>
 | 
						|
$ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
 | 
						|
</tt> 
 | 
						|
</li>
 | 
						|
<li>
 | 
						|
To see all available debug checkers:
 | 
						|
<br><tt>
 | 
						|
$ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
 | 
						|
</tt>
 | 
						|
</li>
 | 
						|
<li>
 | 
						|
To see which function is failing while processing a large file use 
 | 
						|
<tt>-analyzer-display-progress</tt> option.
 | 
						|
</li>
 | 
						|
<li>
 | 
						|
While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt> 
 | 
						|
instead of <tt>clang --analyze</tt>, as the later would call the compiler 
 | 
						|
in a separate process.
 | 
						|
</li>
 | 
						|
<li>
 | 
						|
To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while 
 | 
						|
debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and 
 | 
						|
execute:
 | 
						|
<br><tt> 
 | 
						|
(gdb) <b>p ViewGraph(0)</b>
 | 
						|
</tt>
 | 
						|
</li>
 | 
						|
<li>
 | 
						|
To see the <tt>ProgramState</tt> while debugging use the following command. 
 | 
						|
<br><tt>
 | 
						|
(gdb) <b>p State->dump()</b>
 | 
						|
</tt> 
 | 
						|
</li>
 | 
						|
<li>
 | 
						|
To see <tt>clang::Expr</tt> while debugging use the following command. If you 
 | 
						|
pass in a SourceManager object, it will also dump the corresponding line in the 
 | 
						|
source code.
 | 
						|
<br><tt>
 | 
						|
(gdb) <b>p E->dump()</b>
 | 
						|
</tt> 
 | 
						|
</li>
 | 
						|
<li>
 | 
						|
To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to:
 | 
						|
<br><tt>
 | 
						|
(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
 | 
						|
(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b>
 | 
						|
</tt>
 | 
						|
</li>
 | 
						|
</ul>
 | 
						|
 | 
						|
<h2 id=additioninformation>Additional Sources of Information</h2>
 | 
						|
 | 
						|
Here are some additional resources that are useful when working on the Clang
 | 
						|
Static Analyzer:
 | 
						|
 | 
						|
<ul>
 | 
						|
<li> <a href="http://clang.llvm.org/doxygen">Clang doxygen</a>. Contains
 | 
						|
up-to-date documentation about the APIs available in Clang. Relevant entries
 | 
						|
have been linked throughout this page. Also of use is the
 | 
						|
<a href="http://llvm.org/doxygen">LLVM doxygen</a>, when dealing with classes
 | 
						|
from LLVM.
 | 
						|
<li> The <a href="http://lists.llvm.org/mailman/listinfo/cfe-dev">
 | 
						|
cfe-dev mailing list</a>. This is the primary mailing list used for
 | 
						|
discussion of Clang development (including static code analysis). The
 | 
						|
<a href="http://lists.llvm.org/pipermail/cfe-dev">archive</a> also contains
 | 
						|
a lot of information.
 | 
						|
<li> The "Building a Checker in 24 hours" presentation given at the <a
 | 
						|
href="http://llvm.org/devmtg/2012-11">November 2012 LLVM Developer's
 | 
						|
meeting</a>. Describes the construction of SimpleStreamChecker. <a
 | 
						|
href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">Slides</a>
 | 
						|
and <a
 | 
						|
href="http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4">video</a>
 | 
						|
are available.
 | 
						|
</ul>
 | 
						|
 | 
						|
<h2 id=links>Useful Links</h2>
 | 
						|
<ul>
 | 
						|
<li>The list of <a href="implicit_checks.html">Implicit Checkers</a></li>
 | 
						|
</ul>
 | 
						|
 | 
						|
</div>
 | 
						|
</div>
 | 
						|
</body>
 | 
						|
</html>
 |