How to generate constraints on input to reach a specific program statement/location in execution?

Hi,

I'd like to generate the constraints on input to a program that can
cause the execution of the program to reach a certain
statement/location. For this purpose, I'm interested in developing a
checker using the symbolic execution provided by clang's static
analyzer. Looking through the static analyzer's documentation, my
initial thought was to retrieve the constraints for each conditional
branch statement (e.g. if, for, while, switch) along the execution
path. However, there are a few challenges.

1) Does the static analyzer keep track of constraints for each
variables? If so, how to retrieve the constraints for a specific
variable? The documentation mentions a generic data map associated
with a program state, but it seems that it is the responsibility of
the developer of a checker to maintain it.

2) How to retrieve information on the source location of a statement
without generating a bug report? We need to know how a statement is
control dependent on a conditional branch statement. For example,
whether a statement belongs to the then block or else block of a if
statement. We find that the IfStmt class has information on the range
of source locations of its then block and else block, which can be
used to answer this question. But the source locations seems to be
available only through a SourceManager class that can be used only
when generating a bug report?

3) How to get information on a conditional branch statement before its
execution? The static analyzer doesn't allow pre-processing call back
to any control flow transfer statement, but provides a
checkConditionalBranch call back for conditional branch statement.
However, it only provides the condition used in the branch statement
instead of the branch statement itself.

Thanks.

Hello James,

1. Analyzer deals with constraint with using some constraint manager. The only current implementation of ConstraintManager - RangeConstraintManager class - keeps the constraints in a special trait in GDM named ConstraintRange. You can take a look at RangeConstraintManager.cpp file for more details. These constraints are represented as a set of integer ranges that value can have on some execution branch. Not all constraints can be resolved with this manager because it is simple enough. You should note that analyzer does not constraint the variable itself (which is just a piece of memory - MemRegion); it constraints its value if it is represented as a symbol. To retrieve the constraint on some symbol, you possibly should add a corresponding function to constraint manager. You may try this approach.

Also note that analyzer remembers all options on the path it follows. You may also use this to generate constraints with walking through the generated ExplodedGraph.

2. For each statement you can retrieve its location with getLocStart/getLocEnd/getSourceRange methods.

19.10.2016 23:58, James Huang via cfe-dev пишет:

Hello James,

While you might prefer a concolic tool like KLEE for this purpose, the Analyzer's framework should be more or less suitable for such task as well, even though it was never really developed in that direction.

You've got symbols (SymExpr objects), which are an algebraic notation for values that are unknown during the analysis, such as user inputs, returns of external functions, or values that were already set before the modeling starts.

On atomic symbols (expressions of SymbolData class, as opposed to results of operations), the analyzer maintains constraints managed by RangeConstraintManager. For instance, statement "if (x);", where "x" is a variable of type "int", whose value is unknown and described as symbol "$x", splits the analysis in two branches that differ only by range constraints: one has "$x: [0, 0]", another has "$x: [-2147483648, -1] U [1, 2147483647]".

Additionally, by looking at the symbols, most of the time you can understand where it's coming from. For example, if there's a global variable "y", and we're taking its value before assigning, then its value, which has been there before the analysis has started, is denoted by symbol "reg_$0<y>" (of class SymbolRegionValue), which refers to "y". Presence of constraint "reg_$0<y>: [2, 5]" in the program state would mean that the global variable should have been set to 2 or 3 or 4 or 5 in order to reach the current moment. Similarly, return values of external functions are represented as SymbolConjured; information of the call expression is stored within them. It'd also be a bit tricky with structural symbols and SymbolDerived objects, ask me if you ever get that far :slight_smile:

With constraints being tracked, you do not need to observe branch conditions; i recommend against that. Instead, at any moment, all the info you may need about branches is in the constraints.

One thing you'd notice is that there's currently no API to retrieve constraints as-is; instead, constraint managers are designed to be replace-able, and actual looks of constraints are treated as an implementation detail. You might want to break this abstraction in some way.

Another thing you'd encounter is that constraints are garbage-collected during analysis. If a certain symbol is certainly lost from the program state (no longer stored in any variable etc.), then the analyzer no longer needs to track constraints imposed on it. You'd need to track them though. Checkers are notified of garbage collection in checkDeadSymbols callback. The same callback is also used to find leaks (eg. memory leaks or file descriptors that are never closed - they're merely dead symbols).

So i think this might work.

Hi,

I'd like to generate the constraints on input to a program that can
cause the execution of the program to reach a certain
statement/location. For this purpose, I'm interested in developing a
checker using the symbolic execution provided by clang's static
analyzer. Looking through the static analyzer's documentation, my
initial thought was to retrieve the constraints for each conditional
branch statement (e.g. if, for, while, switch) along the execution
path. However, there are a few challenges.

1) Does the static analyzer keep track of constraints for each
variables? If so, how to retrieve the constraints for a specific
variable? The documentation mentions a generic data map associated
with a program state, but it seems that it is the responsibility of
the developer of a checker to maintain it.

The constrains are being kept, but as others have mentioned, they are mainly used to rule out unreachable paths and querying if certain conditions hold. We only keep constrains that can be expressed as integer values over a single symbol. Furthermore, we aggressively prune the nodes as well as constrains and you would need to experiment with turning off that optimization.

2) How to retrieve information on the source location of a statement
without generating a bug report? We need to know how a statement is
control dependent on a conditional branch statement. For example,
whether a statement belongs to the then block or else block of a if
statement. We find that the IfStmt class has information on the range
of source locations of its then block and else block, which can be
used to answer this question. But the source locations seems to be
available only through a SourceManager class that can be used only
when generating a bug report?

Each node should have a ProgramPoint, which could be wrapping a statement or an expression, which in turn have the location information. You can look at the code that generates the bug reports given error nodes.

3) How to get information on a conditional branch statement before its
execution? The static analyzer doesn't allow pre-processing call back
to any control flow transfer statement, but provides a
checkConditionalBranch call back for conditional branch statement.
However, it only provides the condition used in the branch statement
instead of the branch statement itself.

One way of approaching this would be to reach the node for which you’d like to generate the information and walk the path of the ExploadedGraph up, collecting the constraints along the way.