GSoC 2018

Hi all,

My name is Mikhail R. Gadelha, I’m a PhD student from the University of Southampton, UK. I’m interested in working on the project “Integrate with Z3 SMT solver to reduce false positives”, during the summer by joining GSoC.

This would be my first participation in the GSoC; however, I have already contributed to some open source projects (including clang[0][1] and Z3[2]).

I’ve been working with clang libTooling for a couple of years now, as a frontend for my verification tool, ESBMC [3]. When developing the frontend, I also presented a WIP version in FOSDEM’17 [4], in the LLVM Toolchain devroom. Regarding SMT solvers, I have experience with Z3, Boolector, MathSAT, CVC4 and Yices, as we support them all in ESBMC. Currently I’m developing a solver-independent SMT library for floating-points, as part of my research.

I found the project very interesting and it’s highly related to my research; witness/counter-example validation is an important topic in the field (and a planned feature for our tool). It would be excellent to implement such feature in clang, and check the results in real-world applications.

I contacted George Karpenkov (the mentor of the project) with a question about why focusing on Z3 and the reasons are: (1) it’s already integrated in the build system, (2) license issues with other solvers (e.g., Yices is GPL3) and (3) time constraints.

I also have a question about the proposal. I understand that ideas about the project will be discussed in the mailing list. However, once that’s settled and I write my first draft proposal, should I send it to the mailing list for discussion again or should I send it only to the mentor?

Also any feedback is welcome.

Thank you,

Hey, welcome!

Thanks for picking this up, it's indeed a high-impact item - our current ultra-simplified solver helped us launch and is often sufficient, but a significant portion of our false positives is caused by math being just slightly more complicated than it can digest.

As for communications, the only thing lists aren't good enough for is a quick chat. It's up to you and George to decide how would you handle most of the communication, but i'll recommend keeping the community informed about any interesting topics that show up - either technical questions or your own developments. In particular, you might potentially receive feedback from other active community members on your upcoming proposal.

Please make sure to keep email discussions on the mailing list rather than just personal email. This is a topic that members of the community will be interested in and will have valuable feedback on.

Devin

Hi all,

I’ve been playing with the static analyzer in the last couple of weeks and I came up with some questions about the project:

  1. What’s the preferable way of implementing the option to double check the results? I had some ideas about that:

1.1 To add a new option to the analyzer, so the user can specify when to double check the results. This option gives the user more freedom but the checker’s developers won’t have any control about it.

1.2 To change emitReport and add a default parameter to allow/force double check; this will likely override the user’s desire to double check the reports. One scenario is if the Z3 double checker doesn’t support something in the bug path but the developer wants to show the result anyway.

1.3 Some combination of the two options?

3. This is a list of the TODOs in Z3ConstraintManager, from more
important to less important, in my opinion. I just want to know if the
analyzer's developers (and the project mentor) agree with this list,
as it might go into my proposal:

I don't think you need to worry about any of these. There are a few
bugfixes for Z3ConstraintManager that have been pending in D28955 and
D35450, but I don't recall if the bugs are even currently reachable.

3.1. Don't assume nearest ties to even rounding mode: currently, only
rounding to even is supported, even if the code changes the rounding
mode using fesetround.

This is specific to floating-point support, which requires D28954.

3.2. Don't add all the constraints, only the relevant ones: adding
unnecessary constraints will slowdown the solver.
3.3. Refactor doTypeConversion to use built-in conversion functions
(Refactor to Sema::FindCompositePointerType(), and
Sema::CheckCompareOperands(); Refine behavior for invalid type casts)
3.4. Refactor doIntTypeConversion to use Sema::handleIntegerConversion()
3.5. Refactor doFloatTypeConversion to use Sema::handleFloatConversion()

I bundled this together because, although the conversion seems
incomplete (based on the comments), it's about removing duplicated code.

3.6. Refactor getAPSIntType(const llvm::APSInt &Int) const to put
elsewhere.

These are mostly large refactors that would involve other parts of Clang.

Dominic

Hi Mikhail,

I’m assuming Dominic have answered your questions regarding the point (3).

On point (1) I have recently sent an email on the list answering, I believe, to essentially the same question:
http://lists.llvm.org/pipermail/cfe-dev/2018-March/057064.html

(yes, unfortunately we do not have better archives, so messages might be often hard to track)

  1. I still don’t quite understand how dynamic memory track works in the analyzer, is the double checker expected to work for pointers and dynamic memory as well? I’m assuming yes here and that Z3ConstraintManager might need to be extended somehow (a plan will be added to the proposal).

I think here we should get the extra precision for free by adding a bug reporter visitor, as described in the email thread I have linked to.

Please feel free to ask any further questions, bug reporter visitors are quite messy in the analyzer.

Regards,
George

Hi all,

I’ve written a first draft of my proposal:

https://docs.google.com/document/d/1-zNSv0l4WyoxYpJUAw8LFnQq_TY4AGjIpPu1VPkmO-g/edit?usp=sharing

I’ve added a few comments in places I think need improvement.

May I ask the community to have a look and give some feedback?

Thank you,

Hi Mikhail,

I’ve added some feedback.
Overall, I think we should be aiming for something more low-level and concrete:
adding examples with explanations would be a great improvement.

Regards,
George

Hi all,

Thank you for the feedback, George and Dominic.

I updated my proposal with an example, showing the encoded SMT formula for the program and a brief explanation of the verification process. I used a simplified program from a bug report in Bugzilla.

May I ask for some feedback in this section?

Hi Mikhail,

That’s a good improvement!

I think an awesome next step would be to see whether the analyzer already has the formula required to solve your motivational example.
This would be a preliminary feasibility study: if the formula is there, it’s just a matter of converting it and giving it to Z3, and otherwise,
the exercise is much harder and might require substantial changes.

Perhaps an easiest way to see what formulas the analyzer has is to launch it with an extra flag
-Xclang -analyzer-viz-egraph-graphviz which would dump a graph in a GraphViz format containing all the information analyzer has along all the states.

This is important for judging feasibility, as it might be the case that analyzer at some point decides to get rid of the “complex” constraint.
While it would be possible to change that, that would be a second step of the project,
and for preliminary evaluation a simpler example would be needed.

Also, the information above could be helpful for structuring the project: a first stage would be checking most trivial examples, a second stage would be seeing how far
can we get with only minimal modifications to the core.

Regards,
George

Hi,

I made some small changes to the Z3ConstraintManager, so it could print the SMT formula and the only thing I got was:

(declare-fun $0 () (_ BitVec 32))
(assert (not (= $0 #x00000000)))

Constraints:
reg_$0 : (not (= $0 #x00000000))

which comes from the core.DivideZero checker.

Maybe it’s dropping the “complex” constraint as you said? Both paths should have the (i % width == 0) constraint.

Thank you,

Hi,

Maybe it’s dropping the “complex” constraint as you said? Both paths should have the (i % width == 0) constraint.

Yes, that is quite likely, unfortunately.

We would have to find an easier example first, where the core modification are not necessary.
For the easier example: I think it would have to be simple arithmetics over integers, even negation would work.
The current solver can not handle any relational constraints.

BTW instead of looking into the Z3ConstraintManager I think it would be easier to look at the exploded graph (using the option I have previously described)
and see what formulas are mentioned there.

Hi,

Try making assumptions over 2 * x, these should work if i recall correctly.

Indeed the constraints start to appear when there are multiplications/divisions/remainders, but I noticed that it only prints the constraints when it’s able to solve them.

Maybe in a higher level it’s dropping constraints without querying if the solver can handle it?

I’ll keep looking into it.

Hi,

Just a quick update: I still haven’t found a suitable case to add to the report but I tried to address every other comment there.

Any feedback is welcome!

Thank you,

Hi Mikhail,

Indeed, it seems that the constraint manager is more eager to throw out complex constraints than I’ve anticipated initially.
It might be partly a good thing: now we need to know that evaluating disabling this heuristic and measuring a change in footprint
on a large number of project needs to be a part of the schedule.
An example where a change in this optimization is required would be also beneficial for proposal.

Now onto the example.

  1. Unlike SV-COMP, the analyzer treats “assert” as “assume”. So you would need to e.g. perform null pointer dereference to see whether there’s an error.
    (or use a debug checker and a special function, consult tests for those)
  2. The following function seems to give the desired behavior:

int foo(int x) {
int *z = 0;
if ((x & 1) && ((x & 1) ^ 1))
return *z;
return 0;
}

In a sense that the analyzer reports a false positive (if x & 1 is true, last bit is one, but then (x & 1) is the last bit and (x & 1) ^ 1 should be zero),
it is non-obvious, and just using Z3 with constraints present in the graph should be sufficient.

Regards,
George

Hi all,

I expanded the report quite a bit over the weekend, adding the suggested example, writing a bit more about the current state of the analyzer and implementing the suggestions (thank you George and Artem).

Any feedback is welcome! I plan to submit the report tomorrow, mid-afternoon.

Thank you,

Great! Don’t forget to submit the PDF through the official website!

Hi all,

I just submitted my proposal.

Thank you very much everyone for the feedback.