Next step

[Ongoing discussion on thread safety analysis, moved to list in case
anyone cares. :-)]

* change getMutexIDs so that it creates an LExpr instead of a set of
SExprs. Instead of returning a MutexIDList, it should return the LExpr
created.

Yes.

* addLock should be changed to take an LExpr instead of a MutexIDList
(same for removeLock)

Yes.

* FactSet needs to be updated to handle LExprs for findLock, addLock
and removeLock

Yes, but there's more to it. (See below.)

* FactManager needs to create a FactID from an LExpr for newLock

Not quite. (See below.)

* FactEntry needs to be updated to hold (a union including SExpr and?) an LExpr

Not quite. (See below.)

A few notes on implementation: MutexIDList is just a short list of
SExprs, and it's used when extracting SExprs from attributes, because
an attribute (e.g. exclusive_locks_required) can hold a list of
SExprs, rather than a single one.

FactSet holds a set of SExprs that are known to be held. It's not
just a list for two reasons. (1) A FactSet keeps track of some
additional information about each SExpr, e.g. where the SExprs were
acquired, whether they are exclusive or shared, and the weirdness
associated with scoped_lockable. (2) FactSets are optimized for small
size and merging.

You will need to change FactSet so that it holds a single LExpr.
Instead of the set {A, B, C}, you will store the LExpr (A & B & C).
What this means is that the operations to add locks, remove locks, and
merge locksets will now apply to LExprs.
For example:

   Current context: (A & B & !C). Call C.lock(). New context:
(A & B & C).
   Current context: (A & B & C). Call B.unlock(). New context:
(A & C & !B).
   Context1: (A & B & C). Context2: (A & B & D). Merge on join
point: (A & B & (C | D))

[Joint points]. At every join point in the CFG, the analysis will
have to merge the two LExprs. Merging is conceptually simple: if we
can arrive at a given basic block from either block 1, with context
E1, or from block 2, with context E2, then the context for that block
is (E1 | E2). However, you will get an exponential blowup in
expression size unless you *simplify* (E1 | E2) afterward, as shown
above.

[Adding and removing locks.] In order to add or remove a mutex A from
an LExpr E, you must first remove all prior occurrences of A from E.
(We're chan do not know ging what we know about A.) Unfortunately,
that's not just a simple matter of traversing E, and deleting the
references. You need to derive a new expression E' which does not
mention A, where E is satisfiable only if E' is satisfiable. More
formally, you must factor out A, by refactoring E into the form (A &
E1) | (!A & E2), where E1 and E2 do not refer to A; E' is then (E1 |
E2). There are algorithms to do this for BDDs; the operation is known
as boolean existential quantification. See:
http://gauss.ececs.uc.edu/Courses/c626/lectures/BDD/bdd-desc.pdf

Unfortunately, I do not know how to do this for the simple LExprs that
you have constructed. One option is to require the LExpr to be in the
format (E1 & E2 ... En & A) to start with (so no refactoring
necessary), and simply fail otherwise. That will work for locks,
because it matches the way FactSet is implemented now, and it will
thus allow you to make forward progress. But it largely defeats the
point of implementing LExprs, since analysis will fail if it
encounters anything more complicated than what we already have. The
other option is to ditch LExprs, and implement BDDs. Talk to Dean; he
knows more about this than I do.

My task list, in order of priority, would be:

(1) Add the ability to merge (logical or) and simplify LExprs.
(convert to conjuctive normal form?)
(2) Add the existential quantification operation to LExprs (or
possibly implement BDDs?).
(3) Augment LExprs so that they keep track of exclusive vs. shared,
where locks were acquired, and scoped_lockable.

That's quite a lot of work already, and it will take you some time.
Once it's done, *then* we can do a single patch to swap FactSet over
to the new LExpr stuff. The patch sequence is the same as my strategy
for the TIL/SExpr stuff; build the new functionality in parallel with
the old, make sure the new functionality works, and then flip the
switch from old to new. Notice that I haven't flipped the switch for
SExpr yet. :slight_smile:

We can set up a video chat if you want to talk about this further.

  -DeLesley

[Ongoing discussion on thread safety analysis, moved to list in case
anyone cares. :-)]

* change getMutexIDs so that it creates an LExpr instead of a set of
SExprs. Instead of returning a MutexIDList, it should return the LExpr
created.

Yes.

* addLock should be changed to take an LExpr instead of a MutexIDList
(same for removeLock)

Yes.

* FactSet needs to be updated to handle LExprs for findLock, addLock
and removeLock

Yes, but there's more to it. (See below.)

* FactManager needs to create a FactID from an LExpr for newLock

Not quite. (See below.)

* FactEntry needs to be updated to hold (a union including SExpr and?) an LExpr

Not quite. (See below.)

A few notes on implementation: MutexIDList is just a short list of
SExprs, and it's used when extracting SExprs from attributes, because
an attribute (e.g. exclusive_locks_required) can hold a list of
SExprs, rather than a single one.

Good, that matches my understanding.

FactSet holds a set of SExprs that are known to be held. It's not
just a list for two reasons. (1) A FactSet keeps track of some
additional information about each SExpr, e.g. where the SExprs were
acquired, whether they are exclusive or shared, and the weirdness
associated with scoped_lockable. (2) FactSets are optimized for small
size and merging.

This also matches my understanding.

You will need to change FactSet so that it holds a single LExpr.
Instead of the set {A, B, C}, you will store the LExpr (A & B & C).
What this means is that the operations to add locks, remove locks, and
merge locksets will now apply to LExprs.

Okay, so the FactSet basically becomes our computed environment for
capabilities, and is simply a single LExpr that we constantly update?

For example:

   Current context: (A & B & !C). Call C.lock(). New context:
(A & B & C).
   Current context: (A & B & C). Call B.unlock(). New context:
(A & C & !B).
   Context1: (A & B & C). Context2: (A & B & D). Merge on join
point: (A & B & (C | D))

[Joint points]. At every join point in the CFG, the analysis will
have to merge the two LExprs.

So this would mean modifying ThreadSafetyAnalyzer::intersectAndWarn?
Or are there other places requiring modification?

Merging is conceptually simple: if we
can arrive at a given basic block from either block 1, with context
E1, or from block 2, with context E2, then the context for that block
is (E1 | E2). However, you will get an exponential blowup in
expression size unless you *simplify* (E1 | E2) afterward, as shown
above.

Do you have a particular algorithm in mind for performing the simplification?

[Adding and removing locks.] In order to add or remove a mutex A from
an LExpr E, you must first remove all prior occurrences of A from E.
(We're chan do not know ging what we know about A.) Unfortunately,
that's not just a simple matter of traversing E, and deleting the
references. You need to derive a new expression E' which does not
mention A, where E is satisfiable only if E' is satisfiable. More
formally, you must factor out A, by refactoring E into the form (A &
E1) | (!A & E2), where E1 and E2 do not refer to A; E' is then (E1 |
E2). There are algorithms to do this for BDDs; the operation is known
as boolean existential quantification. See:
http://gauss.ececs.uc.edu/Courses/c626/lectures/BDD/bdd-desc.pdf

Thank you for the link!

Unfortunately, I do not know how to do this for the simple LExprs that
you have constructed. One option is to require the LExpr to be in the
format (E1 & E2 ... En & A) to start with (so no refactoring
necessary), and simply fail otherwise. That will work for locks,
because it matches the way FactSet is implemented now, and it will
thus allow you to make forward progress. But it largely defeats the
point of implementing LExprs, since analysis will fail if it
encounters anything more complicated than what we already have. The
other option is to ditch LExprs, and implement BDDs. Talk to Dean; he
knows more about this than I do.

It sounds a lot more like we really want a BDD than an LExpr. I will
talk to Dean, but do you have any feelings on whether a BDD is a
better approach than the LExpr we have? If you think BDDs are the
correct approach long-term, do you know of any BDD libraries that
would be compatible with our licensing requirements, or whether there
would be issues with bringing in a 3rd party library for this?

My task list, in order of priority, would be:

(1) Add the ability to merge (logical or) and simplify LExprs.
(convert to conjuctive normal form?)
(2) Add the existential quantification operation to LExprs (or
possibly implement BDDs?).
(3) Augment LExprs so that they keep track of exclusive vs. shared,
where locks were acquired, and scoped_lockable.

I think the highest priority would be to determine whether we are
going to use LExrs, BDDs, or a combination of the two. The work is
likely to be roughly the same in terms of hooking things up, but the
tools are different.

That's quite a lot of work already, and it will take you some time.
Once it's done, *then* we can do a single patch to swap FactSet over
to the new LExpr stuff. The patch sequence is the same as my strategy
for the TIL/SExpr stuff; build the new functionality in parallel with
the old, make sure the new functionality works, and then flip the
switch from old to new. Notice that I haven't flipped the switch for
SExpr yet. :slight_smile:

We can set up a video chat if you want to talk about this further.

I think a video chat would be a great idea. We can discuss offlist.

Thanks!

~Aaron

I think BDDs are definitely the best long-term solution, and possible
the best short-term solution. The LExpr stuff is useful for rapid
prototyping, and it's also fast and efficient to build an LExpr from
the requires(...) attribute for the purpose of checking. However, you
can't write efficient algorithms for LExpr. BDDs are basically the
algorithm you want here. Dunno about 3rd-party libraries; my concern
would be finding something very small, so that you don't wind up
increasing the clang memory footprint with a feature that isn't used
all that often. Rolling your own BDD library is definitely an option;
we don't need that much functionality, so it might be best to just
implement what you need and nothing more.

  -DeLesley

FWIW, we have been using buddy (http://sourceforge.net/projects/buddy/), an implementation of BDDs.
This library is pretty small and performs well. It has a BSD-like license. It compiles at least with GCC, clang, and MSVC.
The project was a bit dead for a long time, but it restarted recently. We have been contributing our local patches as well.

(don't know if using BDDs isn't overkill for your needs, but I didn't follow the thread closely).

Nuno

BDDs are what we really want, but we didn't want to link a giant
3rd-party library into clang, so we've been talking about small
light-weight alternatives. If buddy is small, then it's definitely
worth looking into. Thanks!

  -DeLesley