Implement support for C++17 structured bindings in the Clang Static Analyzer

Even though a lot of new C++ features are supported by the static analyzer automatically by the virtue of clang AST doing all the work under the hood, the C++17 “structured binding” syntax

auto [x, y] = ...;

requires some extra work on the Static Analyzer side. The analyzer’s transfer functions need to be taught about the new AST nodes, BindingDecl and DecompositionDecl, to work correctly in all three interpretations described by the Standard.

Incomplete support for structured bindings is a common source of false positives in the uninitialized variable checker on modern C++ code, such as #42387.

It is likely that the Clang CFG also needs to be updated. Such changes in the CFG may improve quality of clang warnings outside of the Static Analyzer.

Expected results: The Static Analyzer correctly models structured binding and decomposition declarations. In particular, binding variables no longer appear uninitialized to the Static Analyzer’s uninitialized variable checker.

Confirmed Mentor: Artem Dergachev, Rashmi Mudduluru

Desirable skills: Intermediate knowledge of C++. Some familiarity with Clang AST and/or some static analysis background.

Project size: 350 hr

Difficulty: Medium/Hard

3 Likes

Hi NoW,

I am interested in this project. I have some questions.

I take a look into the issue that you have mentioned. Changed QT templates with STD templates. Tried to produce the warnings with the analyzer, but as mentioned here, this fixes the warnings.

#include <utility>
#include <memory>

std::pair<int, std::shared_ptr<int>> foo() {
return {42, nullptr};
}

int main() {
auto [x, p] = foo();
auto p2 = p;
}

It is a little confusing. Could you elaborate more on why this happens in QT’s shared_ptr but not in std’s shared_ptr.

I do want to learn more about compiler infrastructure and llvm. But I have no prior experience. How important to have familiarity with Clang AST and/or some static analysis background. I took a compiler design class at university. Does this count as an experience on your side?

Do you have a preferred communication channel? I might have more questions.

Thanks

Reading the source code should be enough. It will take a lot of time. This is the primary forum discussing ideas, but you can post questions on the corresponding Discord channel.
I’ll get back to this question in couple of days to write a more detailed guide.

Hi folks,

I’m a Computer Science student. I’m very interested by this project and to become a LLVM contributor. I believe this project and GSoc are a good entry point to LLVM.
I have experience in compilers as I had courses and projects on them. Note I’ve also build my own as a fun experiment.

Do you want us to forward you a resume ?

By the way, @NoQ, I’ve also watched your talks on YouTube about static analysis. Do you, folks, have any other resources you would recommend for the project ?

@palo:

Do you have a preferred communication channel?

This is the preferred channel! This thread is read by all mentors.

Could you elaborate more on why this happens in QT’s shared_ptr but not in std’s shared_ptr.

That’s a great question, this could actually be a great exercise if you’re interested =) Theoretically you don’t need either of those to reproduce the issue; the static analyzer isn’t doing anything special for those classes, it simply simulates every statement step by step. It is possible that a combination of factors is necessary to trigger it (eg., not only unimplemented transfer functions, but also poor representation of structured bindings in symbolic memory, triggered by the specific layout of the structure). You can try to walk through that Qt example step by step to see what precisely triggers this specific false positive. Note that you don’t need to install Qt to do this; there’s a “preprocessed” file attached to that ticket that already has all the includes included in it.

One way or another, we’ll hopefully get to the bottom of it by the end of the summer. What I’m trying to say is, all bug reports that we compiler developers receive usually look exactly like this. It is everyday routine work for us to “reduce” them to small easy-to-understand examples.

What I recommend even more is, writing down a few more examples by hand, maybe even without smart pointers, just with pairs of integers. You might want to test the static analyzer more directly via magic debug functions such as clang_analyzer_eval() (3.1. Debug Checks — Clang 15.0.0git documentation) to see if it’s at all possible to put a value into a structured binding and read it back.

I took a compiler design class at university. Does this count as an experience on your side?

Totally. This probably gives you some ideas of what’s involved in a feature like this. You have a rough idea what’s an AST and what’s a CFG and what are they used for, that’s already a good start!

@cjordan7:

Do you want us to forward you a resume ?

Nope! GSoC is decided through “project proposals” instead (Writing a proposal | Google Summer of Code Guides). Basically you participate in discussions here, then in April you submit a document describing your understanding of the problem and your plan how to solve it, and we choose the student based on the proposal as well as on our interactions so far. You’re encouraged to submit draft proposals here so that mentors could comment on it early. Even though GSoC is competitive, open-source software is highly cooperative, so you can think of your proposal work as a “competition in cooperativeness”. Don’t worry about others “stealing” your ideas, it’s easy for us to keep track of who came up with what.

Do you, folks, have any other resources you would recommend for the project ?

There are some more references at the bottom of Checker Developer Manual.

1 Like

We’ve got two more confirmed co-mentors: Gábor Horváth and Kristóf Umann!

Note that GSoC still requires at most one student to be selected for every project, so this doesn’t mean we’ll accept more students. But it does mean that the selected student will have more people to talk to =)

Hey , I’m really interested in this proposal, So I was doing my research on this proposal, And also I talked to @NoQ about different questions arises during the time ,via mail.
So basically , It seems like problem is with we are not able to read the BindingDecl data correctly .
Let’s take an example

void clang_analyzer_eval(bool);

struct AA{int a, b; };
void foobar(AA e) {
  int x = e.b;
  auto [a,b] = e;
  clang_analyzer_eval(x == b); // should be TRUE
}

We are failed to assert that “x==b” which is definitely not the case , we should be able to assert it correctly. I talked with @NoQ about this and he told that our inability to assert `x==b’ could be for two reason either we don’t write the data correctly , or we can’t read it correctly. ‘ExprEngine::VisitDeclStmt()’ would be responsible for writing and for reading ‘ExprEngine::VisitCommonDeclRefExpr()’ .
So I checked both things and it seems like we are not able to read it correctly , the file ExprEngine.cpp have the incomplete support for the BindingDecls . Here is the code snippet with the fixme too which is in the ExprEngine.cpp (ExprEngine::VisitCommonDeclRefExpr()).

if(isa<BindingDecl>(D)) {
//Fixme: Proper support for bound declarations. 
//For now, let's just prevent crashing.
return;
}

though I’ll go for some more assertions to make sure that this code path hits. @NoQ suggested to move this discussion to public thread so yeah.
If any of you have any suggestions or anything which you guys wants to point , please do so it’ll help me to come up with more concrete idea about the proposal.
Thanks.

Great, now we’re all here together!

Yup that’s one example, great job coming up with that. There are probably more examples for the two other possible interpretations of the structured binding syntax (array bindings, tuple-like bindings).

message.txt (11.9 KB)
so, I’m able to hit that code point where we were sus , I put an assert inside the if(isa<BindingDecls>(D)) , So the test files which contains the tests related to it and failing is Analysis/live-bindings-test.cpp and Analysis/structured_bindings.cpp .
I think the test file Analysis/structured_bindings-test.cpp shows there is lot of work which is needed to do around structured bindings static checking, it only contains a single test and that’s have an fixme .

Yeah, I was also able to hit that point. And I have also moved around the code and seen they are several other places with FIXMEs about BindingDecl and a few FIXMEs in the Analysis folder.

Like you, @phyBrackets, I’ve also noticed the lack of tests in Analysis, which to be honest I’m not so surprised as @NoQ said there are some extra work to be done on the Static Analyzer side.

In my humble opinion, it probably won’t be enough to add a “few” lines in the if to resolve the problem. Those lines were probably added to avoid crashes :thinking:

If any of you have any suggestions or anything which you guys wants to point , please do so it’ll help me to come up with more concrete idea about the proposal.

I think understanding the way Clang handles static analysis is a good starting point (transfer functions, CFG, CallGraph), and probably also checking what Clang does with the BindingDecl and DecompositionDecl. I mean we probably will have to work closely with the AST.

Last but not least… and certainly the most important is to have a full understanding of the structured bindings by checking what the standard says.

Yup, that’s my understanding as well. Given that there’s three cases in the Standard that seem to require completely different handling, and also given that it’s C++ so we’ll probably have to deal with the bureaucracy around constructors and destructors, the amount of work can get pretty large.

Note that we don’t have to 100% finish all this work during GSoC in order to claim success / pass evaluations. Static analysis is somewhat research-intensive and research always has a gamble aspect. You often don’t know whether something is true or false (the whole point of research is to find this out) but the practical results depend on that dramatically. So while we’re trying our best to plan ahead and foresee design problems, it’s not always possible, and we often discover that additional yak shaving is required. We usually stay flexible and grab any improvements we can reach.

I agree with what @cjordan7’s reply. The static analyzer is a simulator, it reads instructions printed in the AST, in the order described by the CFG, and executes them by updating the symbolic program state accordingly. The key to success is to have the updated state carry just enough information to do the right thing no matter what instruction comes next. Most of the time there are no limits whatsoever on what instruction comes next. So whenever any future behavior may depend on the current instruction’s results, the state has to reflect that.

In particular, every time something goes wrong, there could be two reasons: either we’re not reading the state right, or we failed to write it correctly before. Then, our failure to write it correctly could also have the same two culprits, ad infinitum.

So the problem-solving process for simulation problems is roughly the same every time, almost mechanical. Write a simple code snippet that demonstrates the problem. Dump the Exploded Graph. Find a transition in the Exploded Graph that doesn’t look right: it either doesn’t put enough information in the state for future use, or it doesn’t read the existing information correctly. Fix the problem, repeat until the problem is resolved. Then proceed to the next snippet until you run out of known bugs =)

Hey, is there any guide to how to deal with preprocessed file ?

It’s just source code. You feed it to the compiler. There’s nothing special about it.

oh thanks.

Hey @NoQ and other Mentors,
I was looking into the CFG and transfer function and found this patch ⚙ D120495 [clang][dataflow] Add transfer functions for structured bindings , this patch trying to add the transfer function for structured bindings but with some flaws like it doesn’t seems to add support for ArraySubscriptExpr , ImplicitCastExpr, and the most important thing it consider the CFG builder support for structure bindings but there is no proper support in CFG for it . So if I’m correct , first we need to add the support for structure bindings in CFG right ? and after then we need to proceed towards the transfer function ? CFG support for it , seems the very first step to me to proceed and working on different false positive regarding BindingDecls.

Hey @NoQ and other respected mentors,
I have a doubt regarding the AST of structured bindings , let’s take this example Compiler Explorer , this produces ast for the first symbolic name and for others like this

| |-BindingDecl col:14 col:14 a ‘const float’
| | `-MemberExpr col:14 ‘const float’ lvalue .x 0x556b77eb0f30

   |   |   `-DeclRefExpr <col:14> 'const AA':'const AA' lvalue Decomposition 0x556b77eb14f8 '' 'const AA & 

why the DeclRefExpr is considered it as lvalue Decomposition? why not something like “lvalue Binding … ‘a’ const float” ? I have a very basic idea about this but wants to know something more concrete. I hope there is not much problem with AST too for structured bindings :no_mouth: .

I don’t think something like “lvalue Binding … ‘a’ const float” is correct. According to the 3rd case of the reference structured binding, which I believe is your case, we have that x is an lvalue same as your example.

Edit: I may be wrong but I think the lvalue decomposition is supposed to represent that.

Edit2: Though, maybe, it could be interesting to add one like this to help the analyzer. But we probably have to explore that idea a little more.

Hey @Xazax-hun and @Szelethus , would love to know view/explanation from you guys :slightly_smiling_face:. It seems like @NoQ is busy with some work I guess .

That patch is part of a separate project. It has little to do with the Clang static analyzer. You can look at it for inspiration, but it will not affect the summer of code project.

Yes, that would be the desired approach. I believe the author of the patch above only wanted to add that code as a temporary workaround until we have proper support in the CFG.

If you look at the whole tree, you do have a BindingDecl for a. The first step is to make sure you understand which parts of the tree will correspond to which parts of the source code (and the invisible constructs that are not written in the code). I think you should also read the documentation for each of the node participating in that tree: clang: clang::BindingDecl Class Reference (llvm.org)

Those docstring often have useful examples that can help understand what the official name of each of the constructs is:

int n[3]; auto &[a, b, c] = n;

a, b, and c are BindingDecls, whose bindings are the expressions x[0], x[1], and x[2] respectively, where x is the implicit DecompositionDecl of type ‘int (&)[3]’.

Yeah cool, I get the idea . But I think it would be easier for supporting structured bindings if we directly reference to getBinding() for the BindingDecls Expression to which the declaration is bound rather than using decomposition declaration that the binding represents a decomposition of . I guess it’ll produce much simpler AST which will be pretty helpful for the Analysis .