PSA: debuginfo-tests workflow changing slightly

Greetings,

If you dont’ care about running debuginfo-tests, and don’t maintain a bot that runs debuginfo-tests, you can stop reading.

I’ve uploaded a patch [https://reviews.llvm.org/D39605] that changes the way you run debuginfo-tests.

Prior to this patch, the way to run them is to clone an external git repository into clang/test and then debuginfo-tests will happen transparently when you run “ninja check-clang”.

After this patch, there will be two workflows depending on if you use multi-repo or mono-repo.

multi-repo: You will need to clone debuginfo-tests into llvm/projects, then run “ninja check-debuginfo”

mono-repo: pass -DLLVM_ENABLE_PROJECTS=“debuginfo-tests”, then run “ninja check-debuginfo”

The motivation for this change is that planned additions to debuginfo-tests require us to be able to make use of lld, and as a result the tests need to live somewhere that can access both clang and lld, not just clang. Furthermore, giving them their own target “check-debuginfo” as opposed to being transparently added to check-clang makes more sense from a usability perspective. Finally, this new approach is mono-repo friendly whereas the previous one was not.

I’m hoping this won’t be too disturbing of a change, but please leave comments and issues on this thread or on the code rview.

Thanks!

Greetings,

If you dont’ care about running debuginfo-tests, and don’t maintain a bot that runs debuginfo-tests, you can stop reading.

I’ve uploaded a patch [https://reviews.llvm.org/D39605] that changes the way you run debuginfo-tests.

Prior to this patch, the way to run them is to clone an external git repository into clang/test and then debuginfo-tests will happen transparently when you run “ninja check-clang”.

After this patch, there will be two workflows depending on if you use multi-repo or mono-repo.

multi-repo: You will need to clone debuginfo-tests into llvm/projects, then run “ninja check-debuginfo”

mono-repo: pass -DLLVM_ENABLE_PROJECTS=“debuginfo-tests”, then run “ninja check-debuginfo”

The motivation for this change is that planned additions to debuginfo-tests require us to be able to make use of lld, and as a result the tests need to live somewhere that can access both clang and lld, not just clang.

I’m not at all opposed to this effort, but I do wonder why this is part of the motivation. Tests in clang/test should be able to use any binary in /bin, right? E.g we use /bin/llvm-profdata for the tests in clang/test/Profile.

Furthermore, giving them their own target “check-debuginfo” as opposed to being transparently added to check-clang makes more sense from a usability perspective. Finally, this new approach is mono-repo friendly whereas the previous one was not.

Yep.

I’m hoping this won’t be too disturbing of a change, but please leave comments and issues on this thread or on the code rview.

We have several bots which clone debuginfo-tests to tools/clang/test, but it shouldn’t be too much of a hassle to migrate them. I’ve CC’d Mike and Chris as a heads-up (or in case they have anything to add :).

thanks,
vedant

llvm-profdata is part of llvm though. It’s perfectly fine for something in clang to depend on something in llvm. However, clang and lld are two independent llvm subprojects, neither of which can depend on each other.

Generally speaking, from a layering perspective, if A depends on B and C, but B and C are independent, that should be reflected in the structure.

For example, in CMake we will need to find out if lld is being built, since it is optional. We would not be able to do this from inside of the clang tree, without requiring the parent cmake (e.g. llvm) to make sure that we traversed into lld’s cmake first. This is a clear layering violation though. Instead, the proper way to do it is have llvm include both, and the run the debuginfo-tests cmake configuration

llvm-profdata is part of llvm though. It’s perfectly fine for something in clang to depend on something in llvm. However, clang and lld are two independent llvm subprojects, neither of which can depend on each other.

Generally speaking, from a layering perspective, if A depends on B and C, but B and C are independent, that should be reflected in the structure.

For example, in CMake we will need to find out if lld is being built, since it is optional. We would not be able to do this from inside of the clang tree, without requiring the parent cmake (e.g. llvm) to make sure that we traversed into lld’s cmake first. This is a clear layering violation though. Instead, the proper way to do it is have llvm include both, and the run the debuginfo-tests cmake configuration

Got it, thanks.

vedant

From the CI side moving this stuff around is a huge undertaking. We include this repo in every build, they will all need to be fixed and verified. It is a lot of work on our side. Is there a plan for both system to work side-by-side as we migrate jobs? Talking to Mike today, we estimated a week of work to migrate and verify, plus residual failures for the next month.

Regarding your motivation for this change, could that test be added in a different suite?

I propose we drop these tests from all but one of our OSX bots. I don’t see them fail often, and they have a large maintenance burden.

By adding in a different suite, you mean the lld part? I mean theoretically, but that would be pretty awkward, because the idea behind the lld requirement is that we want to make debuginfo-tests work with clang-cl and CodeView debug info, and for this lld is a hard requirement. In that sense, there really isn’t a meaningful way to have debug info tests on Windows without lld. So by putting it another repo, we’d have a “windows debug info tests” repo and a “non windows debug info tests” repo.

Although the format of the tests will probably look a little different, and the debuggers being run to verify the tests will definitely be different, conceptually they’re really the same thing.

Even ignoring the LLD aspect, I think this layout just makes more sense, and when I spoke to several people about it at the dev meeting, I think pretty much everyone was in agreement. Case in point: test-suite is conceptually very similar to debuginfo-tests, so it’s awkward when they use completely different layouts in the source tree and different methods of running the suite.

So I actually think this organization is more idiomatic with the way LLVM normally does things, independently of the desire to depend on LLD.

In fact, when I started looking into test-suite I got to thinking that maybe debuginfo-tests should actually be part of test-suite. To be clear: I personally have no intention of doing this now or in the future, but the point is that they are similar enough that we should really treat them the same.

I admit I’m not familiar with the CI side of things, but from the perspective of someone doing this locally, the transition procedure is:

  1. Clone the same git repo as before, but into a different location on disk.

  2. Add an additional check step that runs ninja check-debuginfo

Nothing else should need to change. I trust you when you say this is a huge undertaking since you know this stuff better than me, but I want to understand where the extra effort comes from.

I will need to check what happens if you just do nothing and leave bots running the way they are today. I guess the way to check this is to clone the repo in the “old” location, apply my patch to that location, and then run ninja check-clang. It may continue to work, but I haven’t tested it. Note that it’s the weekend so I can’t check this until Monday though.

From the CI side moving this stuff around is a huge undertaking. We include this repo in every build, they will all need to be fixed and verified. It is a lot of work on our side. Is there a plan for both system to work side-by-side as we migrate jobs? Talking to Mike today, we estimated a week of work to migrate and verify, plus residual failures for the next month.

Regarding your motivation for this change, could that test be added in a different suite?

I propose we drop these tests from all but one of our OSX bots. I don’t see them fail often, and they have a large maintenance burden.

Since they are an end-to-end-test that the clang (and the lld) we just built works with the system debugger, there is no benefit of running them in more than one clang build configuration. From that point of view that would be ok.

-- adrian

By adding in a different suite, you mean the lld part? I mean theoretically, but that would be pretty awkward, because the idea behind the lld requirement is that we want to make debuginfo-tests work with clang-cl and CodeView debug info, and for this lld is a hard requirement.

Out of curiosity, why is lld a requirement for this? The system linker is sufficient for DWARF testing and I’d have thought the Windows system linker would be sufficient for testing clang-cl with CodeView output.

I assume lld is a requirement if you want to test lld’s ability to link CodeView, but not a requirement for testing clang-cl’s CodeView output?

If we want to test clang-cl’s CodeView output, we don’t need debuginfo-tests in the first place. We can just use llvm-readobj or something to dump the codeview from the object files. If we want to test that it works with a real debugger though, then we need to be testing PDBs.

One option is to just have clang-cl emit object files and use Microsoft’s linker to link them and produce a PDB. We will indeed probably do this, as there is some value in this. But at the end of the day, that does nothing to verify that the PDBs we generate are kosher. And the PDBs that we generate come from LLD. Actually, testing our PDBs is actually the most important testing scenario (as opposed to testing our CodeView using MS’s linker to create the PDB), because CodeView is pretty well understood. I have high confidence that if our llvm-readobj CodeView tests against clang-cl generated object files pass, that the PDB generated by Microsoft’s linker will be just fine. But PDB is still very opaque, and there’s a ton of stuff that goes in there that is not CodeView and which is entirely synthesized by the linker. The only way to test that is to have LLD generate the PDB.

IIUC you are mainly wanting to test LLD’s PDB generation. Obviously a test suite plugged in under clang/test is not a good fit for that. It could arguably fit into the LLD project, but separating it out as a more end-to-end integration project a-la test-suite seems like a much better idea.

Moving debuginfo-tests seems like a way to get a project in place with the right layering and maybe some lit infrastructure to make writing tests simpler. This tactic appears to have a broader impact than you thought, or even I thought, if it’s going to take somebody with MikeE’s skills a week to get it running, mostly, in Apple’s environment.

How about this alternative: Set up a new project (preferably with a name that won’t cause confusion with llvm/test/DebugInfo) that copies all the fiddly bits you need from debuginfo-tests, and which lands in the right layering place. Over time you can move individual tests from debuginfo-tests to the new place. Eventually debuginfo-tests will be empty, and we nuke it. Is there a serious downside to working it that way?

This puts more of the burden on you, to conjure up a whole new project, but you’re the one who wants it, so that seems fair. J Then the people with more complicated CI setups, like Apple and Sony, can add the new thing at their leisure without worrying about the kind of disruption that ChrisM anticipates. It’s not like the SCM history of debuginfo-tests is all that important; it’s a really small project.

–paulr

I’m honestly not opposed to this idea. It just seems a shame to do this for purely logistical reasons if most people agree that the “right” place for debuginfo-tests is outside of the clang tree.

That said, I’d still like to hear from ChrisM and MikeE about why it will take so long, because on the surface it seems like a low-impact move.

Nothing about the change is complex, it is just far reaching. It looks like we have 69 builds using the repo internally, and 26 on green dragon. We would have to convert them in bulk (with a Jenkins shutdowns), then each will have to be verified. To further complicate things, the debuginfo-tests repo is not branched with the compiler, so we have to back port the cmake changes to all previous branches we still run.

I’m honestly not opposed to this idea. It just seems a shame to do this for purely logistical reasons if most people agree that the “right” place for debuginfo-tests is outside of the clang tree.

I totally understand what you are saying here and will just add that sometimes being part of a larger community means being willing to do things, sometimes, not exactly the “right” way, due to logistical reasons. I am not opposed to what you would like to do, I’m just furrowing my brow at the timeframe in which to do it.

That said, I’d still like to hear from ChrisM and MikeE about why it will take so long, because on the surface it seems like a low-impact move.

Past experience has taught me, anything I think is going to be simple and quick to fix, rarely ever turns out that way. While there will be a significant amount of work to change the way our bots work here at Apple, the work is not impossible to accomplish. Given the choice, I would of course prefer an approach such as Paulr has suggested. The ability to run things in parallel for a time provides for a much lower impact change on the entire community. I think this approach may also give us some time to decide where the debuginfo-test should fit in the new mono-repo. It would be a bummer to do the work necessary to make this change, only to discover we would have to do it differently in the not too distant future to accommodate the new mono-repo.

Zach, I do not want to be a blocker here. I just want to make sure we have explored all of the options to make sure we are not missing a lower impact approach. I also want to make sure we are not doing something that could wait until we migrate to the mono-repo next year.

Thanks,
Mike

I’m going to spend a little time seeing if i can make the change invisible to the bots so they will continue to work as they do today. Will report back after I’ve explored that a bit

Thank you Zach.

I tested this out, and AFAICT nothing will change. It will continue to just work if you have it checked out under clang/tests. It’s a bit hard to construct this configuration locally since it requires moving some files around, and applying half of a CL here and half of a CL there. But, AFAICT it works.

I’m happy to send you some patches if you want to try them locally and confirm.

I’d like to print out a CMake warning if it detects the tree under clang/test and just mention that the workflow is deprecated. Any objections?

Hi all, I think I’ve addressed all the concerns here, and I believe there should be no immediate impact to the current workflow. with that said, I plan to commit this either later today or early tomorrow if there are no other concerns.

Hi Zach,
Thanks for doing this extra work to make this lower impact for the rest of us. Let’s give it a try and see what happens.

-Mike

Since it’s towards the end of the day already, I’ll put this in tomorrow morning around 9 or 10, to make sure I’m around to fix anything that arises (or revert).

This is in as of r317925. I’m keeping an eye out for failure notifications. I may or may not need help diagnosing if something does go wrong (although I’m keeping my fingers crossed)