Managing Clang and LLVM dependencies systematically

Hello,

I am essentially an outsider here but thought I would put down some
ideas about how to "fix" (so to say) the dependency on LLVM and Clang.
The aim here is to shield LLDB from build breakages introduced due to
API changes in LLVM and Clang. This is not to say that LLDB should not
take in the latest and greatest Clang/LLVM, but only that such a thing
should be managed more systematically (by not breaking LLDB builds
without notice).

[The assumption I am making here is that managing the dependency on
LLVM and Clang is the responsibility of the LLDB project. Correct me
if I am wrong. If it in fact is the responsibility of LLVM/Clang, then
just ignore the rest of this mail. Also, its very likely that an
equivalent approach was already discussed. In which case, kindly point
me to it.]

The approach I have in mind is outlined as a bunch of steps:

1. Let the official way to build LLDB be via a script called
build_lldb(.py) living in lldb/scripts/.

2. This script should have something like this:
    LAST_GOOD_LLVM_SHA = <git sha>
    LAST_GOOD_CLANG_SHA = <git sha>
    LAST_GOOD_LLVM_REV = <svn rev>
    LAST_GOOD_CLANG_REV = <svn rev>
    [To keep it simple, I will only outline Git related process henceforth.]

3. build_lldb is really simple. Before actually building, it syncs
LLVM and Clang to LAST_GOOD_LLVM_SHA and LAST_GOOD_CLANG_SHA
respectively. There are the shas which are known work for LLDB ToT.

4. When and why should LAST_GOOD_LLVM_SHA and LAST_GOOD_CLANG_SHA be updated:
i) To bring in the latest and greatest of either or both of them.
    a) A newer version of Clang/LLVM could be providing a new feature which
    LLDB would like to use. In such a case, the last good shas should be updated
    in the same patch that uses the new feature in LLDB.
    b) A newer version of Clang/LLVM could just be doing some API adjustments.
    In such a case, the last good shas should be updated in the same patch that
    adjusts for the new API in LLDB.
ii) When adding a feature to LLVM/Clang to be used in LLDB. This is essentially
    similar to point (a) from above. For such cases, the feature to LLVM/Clang
    should be added first. Then, the corresponding shas should be updated in the
    same patch that uses the new features in LLDB.

The benefits the above approach brings are:
1. Developers working only on LLDB don't get hit by breaking Clang/LLVM changes.
2. Bisecting will be a lot easier now as it is all just one repo in LLDB's view.

There are downsides ofcourse to this approach:
1. It introduces a process (hence, a burden) wrt managing the dependencies.
2. It will require regular updates to the last good shas (even if LLDB
has nothing to gain from them).

In my view though, the benefits outweigh the burdens.

Thanks,
Siva Chandra

I actually think it is good that incompatible changes to llvm break the lldb build bots right away. Then they will get fixed in lldb right after the change was made when it is clear in people's minds what just went on. So I wouldn't want to add any of this sort of machinery to lldb's build w.r.t. the build bots. Now that the build-bots are running regularly, the clang folks can also see the breakage right away and just fix it, which they often do (thanks for that BTW...) So if there were a "GOOD_LLVM" it should not be used for the build-bots.

I'm also leery of fixing on "stable good versions of llvm/clang" for use in lldb for any length of time, since any subtle bugs that get introduced on the llvm/clang side will get to mix with other subtle bugs and incidental changes and make themselves more complex to detangle when we get around to pulling in the llvm/clang changes. So again, I would urge folks really working on lldb not to use the "last good clang" for very long, since those are the people who will see this early.

I can see the advantage of this for "casual lldb developers". Are there enough of them to warrant putting this in?

Jim

I can see the advantage of this for "casual lldb developers". Are there enough of them to warrant putting this in?

If it is not made official, then there is no onus to maintain it, and
hence would eventually fall out of use and bitrot.

I take it that you prefer Clang/LLVM changes leading to LLDB breakages
be seen immediately.

I agree; I think the way to improve the Clang/LLVM/LLDB compatibility
story is to

* Ensure the test suite is reliable and has no intermittent tests, so
that a new failure represents a certain regression.

* Make sure all platforms of interest have buildbots that work and are
usually green. The FreeBSD bot has been running successfully for a
while, although the tests are failing more often than not of late.
I'll fix this up so that the failures become xfails at least.

* Help bring LLVM and Clang developers into the fray, so that they're
regularly building LLDB and have a fix ready when they have a
ABI-breaking change to Clang or LLVM.

I can see the advantage of this for "casual lldb developers". Are there enough of them to warrant putting this in?

If it is not made official, then there is no onus to maintain it, and
hence would eventually fall out of use and bitrot.

I take it that you prefer Clang/LLVM changes leading to LLDB breakages
be seen immediately.

Yes.

The change in LLDB r164563 is also relevant here, particularly build-llvm.pl.
http://llvm.org/viewvc/llvm-project?view=revision&revision=164563

The intent is that fron now on top-of-tree will
always build against LLVM/Clang top-of-tree, and
that problems building will be resolved as they
occur. Stable release branches of LLDB can be
constructed as needed and linked to specific release
branches of LLVM/Clang.

-Ed

Just curious, is there a document or a script which does this? For
example, if I checkout an LLDB release branch (via some script may
be), does it automagically checkout the linked release branches of
LLVM and Clang?

+1, LLDB breakages need to be more visible to Clang/LLVM developers.
Currently they are not very visible, mostly for no good reason.

Stabilizing the LLDB test suite would help, but the bots could probably be
more aggressive about sending IRC or email pings when the build (not tests)
fails, as this is the primary way that LLVM and Clang changes break LLDB.

I’ve made a point of prioritizing getting our tests to run cleanly here, so it would be a good time for the community to do likewise for other platforms. Among other benefits, Improving the signal/noise ratio for test failures will make the message to LLVM a lot clearer.

Kate Stone k8stone@apple.com
 Xcode Runtime Analysis Tools

Also +1. I hope to have this going on Windows sometime before the end of the year. Getting ProcessWindows working at least with minimal functionality is one of the last major hurdles.

We completely agree that there should be a continuous build with top-of-everything.

We’re looking to add a continuous lldb build with the curated versions of llvm/clang. I think this will help lldb developers with the signal to noise ratio. (Separating their breaks from clang/llvm breaks) I can probably get some CPU time for this new build.

Chromium does a similar thing with their multitude of open source dependencies. Siva was explaining to me that there is a engineer “gardener” that updates the working versions file daily (we could do weekly) by looking for successful “top-of-everything” builds. The “gardener” responsibility is handed off in rotation.

Why don’t we do this as a trial. We will setup the hardware (linux at least) and take responsibility for gardening. We would like our build slave to be triggered by the llvm buildbot master.

If there is value in this to the community, we’ll expand the gardening responsibilities. We can also update the public lldb build instructions to use the curated build script.

Thanks,

Vince

One of my posts didn’t go through earlier. It looks like I responded privately to Siva instead of to the list.

My main issue is that I don’t want the canonical way of getting a new developer started hacking on LLDB to veer either sideways or backwards with respect to how one gets started on LLVM. It should either stay the same of veer towards LLVM, which means CMake and Ninja.

As a result, I wouldn’t want this functionality to be hidden behind a script unless it were also accessible through CMake (similar to how you can use dotest.py to run the test suite manually, or you can just build the check-lldb ninja target after generating CMake).

One way to do this in CMake might be to have an LLDB_USE_LKGR_LIBS CMake variable which defaults to false. If it’s true, it runs this script to set up the build environment in whatever way is necessary to get these revisions, and then using ninja works the way it always does, just build the lldb target and it will automatically use the LKGR targets.

I don’t think Chromium’s dependency rolling model is a good fit for the way that LLDB should consume Clang/LLVM.

I would say that Chromium is to Blink as LLDB is to Clang. Both are run under the same parent umbrella project. However, I’ve been lead to understand that Blink rolls are a huge pain, and Chromium is actively moving away from this model by attempting to merge the repositories.

I’m not proposing merging LLDB and Clang repos, but I would say that we should consider them part of the same project. If Clang changes break LLDB, then there is a burden on the Clang developer to to fix LLDB promptly or find someone with more LLDB knowledge if the fix isn’t trivial. This is the relationship that Clang already has to LLVM. It’s OK for LLVM to break Clang, and it should be OK for Clang to break LLDB, so long as it’s fixed promptly. If the fix isn’t prompt, it’s OK to start reverting to get back to green.

In short, what I really think we need is:

  • More stable LLDB tests with more signal and less noise
  • More visibility into LLDB build and test failures for Clang and LLVM developers

Rather than spending time managing blessed revisions, I would rather spend resources watching the bots we already have (http://lab.llvm.org:8011/builders/lldb-x86_64-debian-clang, http://lab.llvm.org:8011/builders/lldb-x86_64-freebsd) and pinging developers on email and IRC to fix regressions. In other words, take a harder stance on breakage.

Does that seem reasonable?

I think one of the use cases they were trying to solve was being able to bisect and find which change caused a particular break. It might be worth mentioning https://github.com/chapuni/llvm-project, which is a git repo which is exactly for this purpose, and AFAIK is automatically maintained to ToT and has no limitations.

I don't think Chromium's dependency rolling model is a good fit for the way that LLDB should consume Clang/LLVM.

I would say that Chromium is to Blink as LLDB is to Clang. Both are run under the same parent umbrella project. However, I've been lead to understand that Blink rolls are a huge pain, and Chromium is actively moving away from this model by attempting to merge the repositories.

I'm not proposing merging LLDB and Clang repos, but I would say that we should consider them part of the same project. If Clang changes break LLDB, then there is a burden on the Clang developer to to fix LLDB promptly or find someone with more LLDB knowledge if the fix isn't trivial. This is the relationship that Clang already has to LLVM. It's OK for LLVM to break Clang, and it should be OK for Clang to break LLDB, so long as it's fixed promptly. If the fix isn't prompt, it's OK to start reverting to get back to green.

In short, what I really think we need is:
- More stable LLDB tests with more signal and less noise
- More visibility into LLDB build and test failures for Clang and LLVM developers

Rather than spending time managing blessed revisions, I would rather spend resources watching the bots we already have (http://lab.llvm.org:8011/builders/lldb-x86_64-debian-clang, http://lab.llvm.org:8011/builders/lldb-x86_64-freebsd) and pinging developers on email and IRC to fix regressions. In other words, take a harder stance on breakage.

Does that seem reasonable?

+1

Jim

Reid mentioned watching the bots we already have. Is there any way to get a Mac bot on there? Do you guys already have one that I’m not aware of that maybe just needs to be hooked up?

I think Sean & Adrian are working on this. Sean's out today, but I'll ask him tomorrow.

Jim

I’m on this. We have most of the infrastructure up, it’s just a matter of tying up some ends before we get regular reports.
I’ll let you folks know more about the bot once it is doing something.

Sean