Try-Throw-Catch handling in CFG

So, if I have a simple program like:

void foo()
{
   try { throw( 5 ); }
   catch( int x ) {}
   catch( ... ) {}
}

This generates a CFG that looks like: (Excuse the ASCII graphics .)

       throw
            >
            v
           try
  > >
  v v
catch(int) catch(...)

where these are each blocks. How does the try block know the type of the throw argument? Should that be stored in the state on exit from the throw block or is there a better mechanism for passing that information?

Thanks!

  - jim

Hi Jim,

I'm not certain I understand your question. What are you trying to accomplish? The CFG certainly doesn't retain any type information; that's up to the AST. If you are talking about the static analyzer, it's up to the static analyzer's value flow analysis to track the thrown value, including its type, so that the path analysis matches with the right 'catch' block.

Ted

Hi Ted,

Yes, that's exactly what I was asking about. So, the value and type of the thrown value has to be passed from the throw block to the try block which must in turn pass it to the appropriate catch handler block where it's value is bound to the argument of the catch. I guess the question here is how is the value 'thrown' designated in the state such that the 'try' block is able to reference it? i.e., there is no declaration of the value that the try block can reference to look up the value, so how is it designated? Or, is there something I'm missing for tracking that value/type? I am assuming that I have to implement something, but wanted to check before I started.

  - jim

Hi Ted,

In the AST, the body of the TRY and any CATCHes are sub-expressions of the TRY statement. In the CFG, this is rearranged such that the body is independent of the TRY block and only linked to the TRY block if there is a THROW expression. Three approaches to connecting the value of the THROW expressions with the TRYs come to mind:

1) One approach would be to walk up the AST from the THROW expression looking for a TRY statement and use the statement pointer as the key for storing the value of the THROW expression. When the TRY block is entered, it would check the state for a THROW value using it's statement address as the key, remove it from the state and transfer to the appropriate CATCH block after binding the value to the CATCH block argument.

The only caveat would be with nested TRY statements and a THROW occurring within an inner CATCH. In this case while walking back up the AST, if a CATCH is encountered, the parent TRY gets skipped and walking continues until another TRY is encountered.

2 ) Another approach would be to somehow tag the THROW expressions with the appropriate TRY statement while building the CFG. Maybe add something to the context that points to the enclosing TRY. This would more closely simulate what generated code does as it usually keeps a stack of the nested TRY execution states.

3) Which brings me to the third approach which is to change the way the CFG is built to actually have a TRY start block precede the body. This would push it's context onto a TRY stack. There would have to be a TRY end block as well to pop the context from the TRY stack. (This could also be used for the ObjC FINALLY). There would be a generic CATCH block that would transfer to the appropriate CATCH block based on the value of the THROW. This would pop the TRY stack as well. THROW expressions would use the topmost value on the TRY stack to save the value/type of their expression in the state to be used by the generic CATCH block. This would even more closely simulate actual behavior and work with IPA which could use the TRY stack to connect to the proper CATCH statement.

A minor alternative to number 3 would be to have TRY end block have the CATCH blocks as successors as well as the code after the CATCH blocks (call it continuation code ) as a successor. If no THROW expression is stored in the state, control is passed to the continuation code. If a THROW expression is found, control is passed to the appropriate CATCH block after binding the value to the argument of the CATCH block.

I'd be interested in hearing what you or anyone else thinks. Thanks!

  - jim

  1. One approach would be to walk up the AST from the THROW expression looking for a TRY statement and use the statement pointer as the key for storing the value of the THROW expression. When the TRY block is entered, it would check the state for a THROW value using it’s statement address as the key, remove it from the state and transfer to the appropriate CATCH block after binding the value to the CATCH block argument.

The only caveat would be with nested TRY statements and a THROW occurring within an inner CATCH. In this case while walking back up the AST, if a CATCH is encountered, the parent TRY gets skipped and walking continues until another TRY is encountered.

This seems a bit kludgy, especially once we have inter-procedural analysis. Conceptually, we’d like the ability to model an exception even if we don’t literally see the “throw”. That’s why I suggested introducing a new concept, say a new MemRegion, to represent the thrown value. I’ll admit that I haven’t thought too much about the details.

2 ) Another approach would be to somehow tag the THROW expressions with the appropriate TRY statement while building the CFG. Maybe add something to the context that points to the enclosing TRY. This would more closely simulate what generated code does as it usually keeps a stack of the nested TRY execution states.

I don’t think this would work as well. The “throw” might occur in code that doesn’t have a try statement. When doing IPA, we would want our EH handling logic to work regardless.

Out of curiosity, are you building the CFG with all the EH edges enabled? That’s not the default option when building the CFG. It sounds like you are looking at the limited CFG where we omit many of the EH edges for brevity. If you really want to do analysis on EH, we’d need all the EH edges enabled in the CFG.

  1. Which brings me to the third approach which is to change the way the CFG is built to actually have a TRY start block precede the body. This would push it’s context onto a TRY stack. There would have to be a TRY end block as well to pop the context from the TRY stack. (This could also be used for the ObjC FINALLY). There would be a generic CATCH block that would transfer to the appropriate CATCH block based on the value of the THROW. This would pop the TRY stack as well. THROW expressions would use the topmost value on the TRY stack to save the value/type of their expression in the state to be used by the generic CATCH block. This would even more closely simulate actual behavior and work with IPA which could use the TRY stack to connect to the proper CATCH statement.

A minor alternative to number 3 would be to have TRY end block have the CATCH blocks as successors as well as the code after the CATCH blocks (call it continuation code ) as a successor. If no THROW expression is stored in the state, control is passed to the continuation code. If a THROW expression is found, control is passed to the appropriate CATCH block after binding the value to the argument of the CATCH block.

Do we need this? I haven’t looked at the CFG in a while where the EH edges are constructed, but I think when a CFG is constructed with that build option it includes all the edges we need between the throw statements and the catch blocks, etc.

To summarize, the CFG currently can be built with and without verbose EH edges. The default is not to include them, because it is very pessimistic and blows up the size of the CFG on many cases. That said, if you really care about doing static analysis on EH logic, then building the CFG with exception handling edges is the way to go. Have you looked at building the CFG with that option enabled? It may actually include everything you are looking for (as far as the CFG is concerned).