[Gsoc 22] Draft Proposal, Review needed

Hey all , I’m interested in the Project Implement support for C++17 structured bindings in the Clang Static Analyzer ,
After long discussion and suggestions from @NoQ and other mentors , here is my draft proposal Gsoc22 Proposal(phyBracktets) - Google Docs . If anyone wants to point out something please do so this will help me to come up with better and concrete proposal.

Thanks

I like how you made examples for all three possible interpretations of the structured binding syntax. If you have time to add more stuff, I’d recommend focusing more on static analyzer side of things: abstract state, details of transfer functions. I also suspect that the user-defined/tuple-like interpretation requires a lot more investigation, given that function calls are involved.

For by-value structured bindings the behavior should be analogous to the behavior to separate variable declarations

Like I mentioned before, this is not correct. You cannot ignore DecompositionDecl entirely, you get different semantics this way.

Memorize that name ‘a’ refers to the value computed on the previous step

I’m curious how deeply you investigated whether it is actually necessary to put this information in the abstract state. It might be that field name and Store binding for the decomposition are sufficient to compute ‘a’ immediately at any point within its scope.

Thanks for the suggestion, I tried to adding the details for the transfer function(Updated the doc above) ,

Yeah, that’s my understanding as well .

Yeah I understand, but I’m confuse that how we gonna memorize that name ‘a’ refers to value computed on the previous step.

they’re simply names assigned to fields of the otherwise anonymous decomposition variable.

is there any way , we gonna memorize that name ? Or simply with the help of DecompDecl ?

BindingDecl <col:14> col:14 a 'const float'

Interested to know about the corresponding cfg for this ?

As we know , that BindingDecls have nothing to do with VarDecls; they’re a completely
different kind of thing with completely different semantic as per the standard, can we just mimic BindingDecls as VarDecls for the Static Analyzer which will help the analyzer and it doesn’t seems uninitialized anymore to the Analyzer. Otherwise it’s quite hard for understanding to me that how we gonna memorize the symbolic name for the variable . @NoQ , If you could explain this that would be great for me.

Thanks

Hi phyBrackets!

I suggest you to write a simple struct that has all of its operations explicitly defined (ctor, copy ctor, move ctor, dtor) and emits a log message from each of those operations. You should experiment with some code snippets, e.g., creating structured bindings by value to the fields of this struct. And have another struct where this logged type is a member and create structured bindings for that one.

Do you have a good intuition what messages will you see when you run code snippets like that? The results may surprise you :slight_smile:

Whatever you see on the console output, you should understand why that happens and plan how would you want to simulate that in the static analyzer.

So to summarize, you should explore how the compiled program actually behaves (do not even look at the analyzer for that part). Once you have a deep understanding how the compiled program works, and why, you will have easier time to model that behavior in the analyzer.

Hi @phyBrackets, nice to see you working on an interesting problem. I wish you all the very best for your GSoC!
I have a cheat-sheet with the flags I used for “quickly” compiling clang and llvm, as well as commands to run the static analyzer.

Doing a bit more scouting on different structured bindings (like the 3 you already included) could be very beneficial on the actual project as well. From my perspective, playing around with the code is also my way of making myself understand the problem better, and come up with the proper solution.

During the proposal phase, having a number of these cases can guide you towards a more thought out plan. Later, each of these cases could be turned into an llvm-lit test case, and as you develop, you can see which test cases will start to act as intended, and observe that some break due to a change having unexpected consequences.

By no means am I suggesting you should start to write hundreds of LOC in lit tests during the proposal phase, just that this approach could have a beneficial side effect later down the road.

Have another crack at Xu, Zhongxing & Kremenek, Ted & Zhang, Jian. (2010). A Memory Model for Static Analysis of C Programs., as included in the checker developer manual. Any ideas where a representation for DecompositionDecl might fit into this model?

Hey @Xazax-hun @RedDocMD @Szelethus , Thanks for the suggestions.

I was reading the paper, I’m not sure but AFAIT currently it doesn’t fits in Region Based Ternary Model , currently analyzer doesn’t models this well . For fitting into this Models , we need to add the proper support in ExprEngine::VisitCommonDeclRefExpr() for BindingDecl .

I’d also updated the doc as per I understood , how structured bindings binds the alias and how not . Any other suggestion regarding Proposal will appreciated .

Thanks

I see you added a section about how by-value bindings are evaluated. Would that change how you want the expected cfg to look?

Yeah surely , that’s not a VarDecl . To be honest I’m bit confused here that how we gonna replace this

4: const float a = obj.x;

we are not gonna declare it like

const float

but some aliastype ? I’m not sure about this that how this gonna look . If you could just explain a bit about this , that would be great .
Thanks

Even before talking about the BindigDecls, do you think your proposed cfg is representing all the operations that would happen at runtime?

I’m not sure but I think i missed the Use of the alias name as an rvalue? And I’m pretty aware of that I might be missing the some important edge points . But I think we can figure that out once we start working on it more closely , although I’ll update regularly if I find something interesting and relevant that will help us solving this problem . The current expected CFG is more like a idea that how it should look like for correct interpretation and will help the static analyzer , there are kind of changes that one can expect but yeah something surely needs to be change in the expected one like the representing BindingDecls as VarDecls .

You have a code snippet in the How it actually binds ? section. Can you make that snippet compile? If so, you can look at the Cfg generated for that snippet and compare it to your proposed solution.

Ah yes, we could expect something like this Compiler Explorer ig, I’ll change the expected cfg as I’m finding something more concrete.

Thanks

If you could just explain a bit about this , that would be great

You might have noticed that we’re holding back on direct answers. This is because we already know that you can echo what we say back to us, but we aren’t sure whether you actually understand what has already been said. In fact, you’re still hard-set on a solution that I explicitly asked you not to do, twice, because it’s incorrect. That’s why we’re looking for an effort from your side to solve a few smaller, simpler sub-problems yourself, or at least try to approach them. That could convince us that we can communicate effectively with you, which is absolutely essential for the success of the project. It does take two to communicate. If we explain everything in every detail, we won’t even be able to pass the first evaluation. If we are to commit to this project, we need to rely on your abstract problem-solving skill, and on our ability to communicate in high-level terms, to take care of the details.

1 Like

Hey @NoQ , I understand what you said. I really appreciate and thankful to you and other mentors for explaining the Project details very well till date, also I just update the expected CFG in the doc , still if you guys have any suggestions and any review for the Proposal would love to listen. Although I think I understands the problem very well and really excited to implement the solution and going to learn alot from the project and mentors.

Thanks

I was hanging around the abstract state for deep understanding of the store ,
let’s take an example

struct AA {
  int a;
  char b;
  float c;
  double d;
  AA(int a, char b, float c, double d) : a(a), b(b), c(c), d(d) {};
};
//AA get();
void clang_analyzer_eval(bool);

 
 void foo(){
  AA obj(0, '0',0.1,0.11);
  auto &[a, b, c, d] = obj; 
  float h = c;
  clang_analyzer_eval(a==obj.a);
 }

The abstract for this look like this

ProgramState:BindingDecl 0xd8dde50 <clang/test/Analysis/structured_bindings.cpp:15:10> col:10 referenced a 'int'
`-MemberExpr 0xd8de2a0 <col:10> 'int' lvalue .a 0xd8c0380
  `-DeclRefExpr 0xd8de280 <col:10> 'AA':'AA' lvalue Decomposition 0xd8ddf70 '' 'AA &'
"program_state": {
  "store": { "pointer": "0xd8f9a10", "items": [
    { "cluster": "obj", "pointer": "0xd8ebda8", "items": [
      { "kind": "Direct", "offset": 0, "value": "0 S32b" },
      { "kind": "Direct", "offset": 32, "value": "48 S8b" },
      { "kind": "Direct", "offset": 64, "value": "Unknown" },
      { "kind": "Direct", "offset": 128, "value": "Unknown" }
    ]},
    { "cluster": "NonParamVarRegion{D1176}", "pointer": "0xd8f9988", "items": [
      { "kind": "Direct", "offset": 0, "value": "&obj" }
    ]},
    { "cluster": "h", "pointer": "0xd8f9cc0", "items": [
      { "kind": "Direct", "offset": 0, "value": "Unknown" }
    ]}
  ]},
  "environment": { "pointer": "0xd8eb400", "items": [
    { "lctx_id": 1, "location_context": "#0 Call", "calling": "foo", "location": null, "items": [
      { "stmt_id": 1376, "pretty": "clang_analyzer_eval", "value": "&code{clang_analyzer_eval}" },
      { "stmt_id": 1385, "pretty": "clang_analyzer_eval", "value": "&code{clang_analyzer_eval}" }
    ]}
  ]},
  "constraints": null,
  "equivalence_classes": null,
  "disequality_info": null,
  "dynamic_types": null,
  "dynamic_casts": null,
  "constructing_objects": null,
  "checker_messages": null

Okay , so what I noticed here the value after the offset 32 is UnknownVal and that’s the reason when I bind h to the value c the value is Unkown here too .

In the other case if i bind int h = a then the cluster looks fine and the value it binds to is **{ “cluster”: “h”, “pointer”: “0xdad7eb8”, “items”: [
{ “kind”: “Direct”, “offset”: 0, “value”: “conj_$0{int, LC1, S1337, #1}” }
]}
**
. I’m really curious why it is happening that up to 32 offset we actually know about the value but after that we really don;t know about it?

Edit: Ahh I see, It actually depending on the data type we are using for the variable, I think it allocates some memory blocks for BindingDecl to models it , and if we try to access the memory location that is not in our control then it binds to Unknown.

Although, by analyzing the abstract state, I guess we really know a lot more about the BindingDecls symbolic name in several cases. And it seems like we can able to compute the Binding decl symbolic name at any point immediately .

for array like cases , when we don’t initialize the array with values then it binds something like this

program_state": {
  "store": { "pointer": "0xeee6910", "items": [
    { "cluster": "NonParamVarRegion{D714}", "pointer": "0xeee6818", "items": [
      { "kind": "Direct", "offset": 0, "value": "&arr" }
    ]}
  ]},
  "environment": { "pointer": "0xeee5de0", "items": [
    { "lctx_id": 1, "location_context": "#0 Call", "calling": "foo", "location": null, "items": [
      { "stmt_id": 731, "pretty": "arr", "value": "&arr" }
    ]}
  ]}

and that means we can get the values of binding immediately at any point within it’s scope. And here is again one interesting case when we are initializing it just like we were doing with structure in the previous case it actually binds the value correctly at different offsets (As i said before that totally depends on how much memory blocks we are querying)

program_state": {
  "store": { "pointer": "0xdd4c4e0", "items": [
    { "cluster": "arr", "pointer": "0xdd47970", "items": [
      { "kind": "Direct", "offset": 0, "value": "1 S32b" },
      { "kind": "Direct", "offset": 32, "value": "2 S32b" },
      { "kind": "Direct", "offset": 64, "value": "3 S32b" },
      { "kind": "Direct", "offset": 96, "value": "4 S32b" }
    ]},
    { "cluster": "NonParamVarRegion{D754}", "pointer": "0xdd4c440", "items": [
      { "kind": "Direct", "offset": 0, "value": "&arr" }
    ]},
    { "cluster": "h", "pointer": "0xdd4c7d8", "items": [
      { "kind": "Direct", "offset": 0, "value": "conj_$0{int, LC1, S932, #1}" }
    ]}
  ]},
  "environment": { "pointer": "0xdd46df0", "items": [
    { "lctx_id": 1, "location_context": "#0 Call", "calling": "foo", "location": null, "items": [
      { "stmt_id": 976, "pretty": "clang_analyzer_eval", "value": "&code{clang_analyzer_eval}" },
      { "stmt_id": 985, "pretty": "clang_analyzer_eval", "value": "&code{clang_analyzer_eval}" }
    ]}
  ]}

Also, I think the Unknown will bind to value in the case of by value, the state which I showed above for array that is for the by reference .

Also, I’ve been looking into the getBinding and getHoldingVar. I noticed that for tuple like types std::tuple or std::pair – the binding declarations have expressions that denote auxiliary variables (BindingDecl::getHoldingVar()) which are initialized up-front,
So for correctly model BindingDecl, I think when we will be building the CFG for tuple like types , we’ll need to walk over the bindings of the decomposition declaration querying that holding variables and build the CFG for that too .

Now let’s talk about the other types,I think what better for us is to rewrite the DeclRefExpr denoting a BindingDecl into the binding expression (BindingDecl::getBinding()) of that binding.

And again I would say the whole problem is with BindingDecl , CFG is mismodeling BindingDecls. If it expanded them to their expression, I don’t think we need any special handling to solve different false positives.