RFC: Removing the DIA PDB plugin from LLDB

LLDB currently has 2 plugins for PDB parsing. One uses Microsoft’s DIA SDK and the other uses purely LLVM code. This is causing problems for anyone wanting to improve PDB support because they have to decide whether to support and test both or just one.

These people, and their time, are rare and the LLDB project needs to make the most of their contributions.

I think everyone is in favour of removing the DIA plugin, this RFC is to make sure that’s the case. There is a tracking issue for doing this - Tracking issue for removing PDB plugin in LLDB · Issue #114906 · llvm/llvm-project · GitHub and a previous discussion The two PDB plugins in LLDB - #4 by mstorsjo.

In the most optimistic timeline:

  • We stop adding anything new to the DIA plugin (in practice I think we already have).
  • The native plugin reaches parity (as measured by in-tree tests) by the time of LLDB 22.
  • The defaults are all switched to native, with some option to switch back to DIA.
  • LLDB 22 ships in this form.
  • All PDB bugs filed as a result will only be addressed in the native plugin.
  • Meanwhile on main aka LLDB 23, we remove all traces of the DIA plugin.

(might be later than 22/23 depending on how fast parity is achieved)

In that space between now and 22, ideally we would test both plugins so that neither regresses. The ways to do this are awkward because the other debug info formats only have one plugin. Which has lead to a big discussion on [LLDB] Run API tests with PDB too by Nerixyz · Pull Request #149305 · llvm/llvm-project · GitHub .

I propose that we acknowledge that this situation is weird, but will be fixed, and put up with a strange implementation of testing in the period between now and when the 22.x branch is taken.

I am invoking that most dangerous idea, “temporary code”, and that is a risk but I think one we have to take.

On the one hand, our current Windows contributors may run out of steam and this work never happens, on the other, making them juggle 2 plugins forever is a great way to put them off forever.

Questions:

  • Does anyone have a great reason to keep DIA for longer?
  • Does anyone object to the short term awkwardness in testing?
4 Likes

CC @Nerixyz , @ZequanWu

Speaking for Linaro who have significant investment in the success of Windows on Arm, we would like tools to be really good, however that is achieved. The native plugin is easier to update, can be used on any platform, it’s the obvious choice.

No, and we are willing to run those extra tests on our existing Windows on Arm bot.

We would like to avoid this being a separate bot because more hardware is more funding to apply for.

We could use one plugin on the Windows on Arm bot and one on the Windows x86_64 bot, but I think the results will be more comprehensible if they’re on the same builder and we won’t miss platform specific issues, as rare as those probably are.

To be clear, anyone endorsing this, or working on PDB right now, is not signing up for this timeline by doing so. It’s purely an example. I’m not anyone’s project manager.

1 Like

Thanks David for brining this into attention, this issue has been on the radar since the start of LLVM WoA efforts but no one has pushed it enough to make it into a reality.

One thing I am curious about, given that Windows itself is closed-source and DIA is the official Microsoft’s PDB API.. whats the strongest reason not to just standardize on the DIA SDK for PDB parsing and avoid maintaining our own in-tree PDB reader? The end goal is only to support PDB on Windows wouldnt DIA save us the complexity of maintaining our own reader.

The ability to open / inspect PDB files on other platforms than windows is pretty important to us. It’s more about the clang dia support and the llvm-pdbutil etc. But it probably is the same for people using lldb to analyze crashdumps non-natively.

2 Likes

That would be a stark departure from how the rest of the LLVM code works - where we have cross platform native implementations of all code, allowing it to be used anywhere. Being able to interpret PDB files only on Windows, if built with the Microsoft SDK, would be quite limiting.

Additionally, as far as I know, we already have essentially most of the code necessary for having it usable with our own code, it mostly requires bug hunting for reaching feature parity. Throwing that away, including the ability to work identically across platforms, would be a huge waste.

In other words, I’m totally in favour of some short time awkwardness if it helps pushing this forward to get rid of the DIA SDK.

1 Like

Thanks @mstorsjo and @tobiashieta All the points in favor of native PDB reader, well understood. Only thing we need in addition to testing both PDB readers is testing with both clang-cl and cl. We would like to see that LLDB’s native PDB reader is fully compaitable with PDBs generated by MS cl. This testing probably needs to go beyond the timeline of transition and continue indefinitely in future.

Hadn’t thought of that part. This certainly makes more sense as a separate test run.

In the short term, we should do a one-off run with MS cl to see where native is at.

I’d say this is realistic.

The bigger challenge is getting actual testing from users (especially since for users, the existence of the native plugin is only mentioned in issues). A good way to test user experience are the API tests. There, the goal should be to provide a similar experience to DWARF.

To add to the previous replies: From a developer’s POV, DIA is like a black box. Debugging issues with it is much harder because of that.

The shell tests already support this (the build script can select either clang or cl). I’m not sure how easy it would be to add cl support to API tests, because cl’s options are so different to clang’s.

1 Like

API tests should work too but will require adding support for cl in the top level makefile that we use for API tests and may also need a bit of tweaking for some of the tests which add custom options through their own makefiles.

FWIW, I think this also is easier to cover properly once we can settle down to one single PDB plugin - otherwise we’d have a testing matrix of (2 PDB producers) x (2 PDB consumers) to cover. If this can be reduced to only 2 configurations, that’s easier to cover than 4 different cases.

1 Like

True, I think the best we can do is get to testing parity and then hope for feedback from the lldb 22 release. Someone out there probably daily drives lldb from main, but I doubt they do it on Windows.

Yes this is the next thing to aim for.

It’ll happen in bits out of order in reality, but we don’t need to match the DWARF experience to make native PDB the default.

Yes, testing with MS cl is worthwhile but not required before we do any of the plugin work.

We haven’t made any claims about MS cl compatibility yet. We’ve said that DIA might be more compatible, it’s made by Microsoft after all, but I’ve not seen evidence to support that.

Thanks for putting together this RFC and thank you everyone that’s been helping us work towards this. We’ve had discussions spread across various GitHub issues and PRs, so it’s helpful to have a central place to bring everything together and focus on the high-level direction. I’m supportive of the plan and the proposed timeline.

If the goal is to make the switch within the 22 release timeframe, we could be even more aggressive and switch the default now. If we’re not happy with the state of the native parser by the time we branch, we can easily switch back the default on the release/22.x branch.

  • It signals our intentions of making the native PDB parser the default in the next release.
  • We get living-on from ToT users, which hopefully motivates them to contribute (maybe I’m being optimistic).
  • We still get DIA coverage on the Windows bots which will test both.
  • We can start unifying the tests and have the XFAILs be the source of truth.

The main drawback that I can think of is that it causes more friction (on ToT). That’s by design, to be a forcing factor and a way to get early signal, but I understand if that’s something we want to avoid. I don’t feel strongly about it, I’m throwing it out there as an idea.

We care about LLDB on Windows because of Swift. DWARF is the preferred debug info format for Swift, but the compiler has some PDB support which can help when debugging things that interface with the Windows API. So while it’s not necessarily on the critical path, we still care about the general health and quality of LLDB on Windows. @charles-zablit is driving the Swift on Windows support for LLDB.

No, getting rid of the DIA plugin addresses the concern I expressed in the patch.

MSFT dev, here. I do a lot of work on PDB-related stuff.

I think dropping DIA is the right path. You’ll miss out on a slow trickle of future issues, but my intuition is that it is worth the hassle to move off of a closed-source dependency.

I’ve made a few modest contributions to PDB quality in LLVM code already, and plan on continuing to do so. I would be willing to help with some degree of catching up with DIA, but only for those records that are known to the public to begin with.

Also, DIA is not a good API to use as a model. It is inherently inefficient in many ways. Not just the implementation, but the interface itself. I would encourage LLVM developers to avoid basing anything on DIA’s interface.

5 Likes

Good idea, I figured that user base is small but why waste the opportunity even so.

For this we need to make a judgement of how close to parity it needs to be (how much of Tracking issue for removing PDB plugin in LLDB · Issue #114906 · llvm/llvm-project · GitHub is done) before changing things on main.

I think @Nerixyz is most informed on that. There’s probably some corner cases we could afford to expose to users of the main version. Also, if it was me, I’d want to make sure any large bits of work were done first just to reduce possible distractions.

Great!

I think at first we’re dealing with anything that clang already produces, which is public. For future testing against MS cl, we can direct our questions to Microsoft via. official channels. Linaro has some experience doing this.

Doing it early is definitely better. If we’re going to have pitchforks, it’s better to face them sooner, while there’s still time to do something about it, rather than around release-time crunch-time.

That said, if we do switch to native-pdb ~now, what’s the use case for testing the old plugin? I mean, testing is good, but unless I’m mistaken, until a very short while ago, we had zero PDB test coverage in our API test suite. So it’s not like we would be losing anything compared to the state from last year. One could say that not testing the old pdb plugin is in line with the “stop adding anything new to the DIA plugin” premise this thread started out with…

I’m not familiar with the details, anyone know what the chances are that work on the native plugin makes the DIA plugin worse?

Assuming the lldb → plugin API stays the same, maybe DIA wouldn’t need to change at all.

My suggestion to test both is based on a default assumption that all code is entangled with all other code unless told otherwise. If they are isolated and we retain a method to run the tests manually with the old plugin, that could be acceptable.

Apart from the Initialize function, the two plugins are completely independent. Of course, it’s possible that the in the course of working on one plugin we break the other one, but that’s also true for changes to the DWARF plugin (which sees much more churn). Probably the greatest risk comes from data formatters, which are downstream of the symbol file plugin, and can become dependent on the way a plugin generates type information. However, the data formatters are also fairly new, so I would hope we don’t need to worry too much about their compatibility with the “legacy” plugin.

In short, I’m not going to stand in the way of testing both plugins, but it seems to me like it would be sufficient to test just the new one.

Ok so I think we agree that we should:

  • Make the native (LLVM code) PDB plugin the default in the next release if possible.
  • Remove the DIA plugin the release after that.

In addition, if we can switch the default now, we maximise the feedback we will get.

The simplest way to time that is when we can run all existing tests with the native plugin without failures. Tracking issue for removing PDB plugin in LLDB · Issue #114906 · llvm/llvm-project · GitHub reported in November 2024 that:

Currently, there are 12 failed tests and 1 unexpected passed tests.

I presume this has improved since, @Nerixyz could you provide an updated number? (post it here and on the issue please)

You are probably on x86_64, so I will get results from Windows on Arm.

Sure, there are still 12 failures (x86, but a few different ones):

  • Settings/TestFrameFormatFunctionReturnObjC.test (new) and SymbolFile/DWARF/dwo-static-data-member-access.test (known)
    breakpoint set -n <name> doesn’t work with the native plugin(?). Might be because AddSymbols isn’t implemented yet ([LLDB][NativePDB] Implement `AddSymbols` by Nerixyz · Pull Request #154121 · llvm/llvm-project · GitHub). Though for the second test, it’s unexpected that this runs with PDB on Windows in the first place.
  • SymbolFile/PDB/expressions.test (known)
    Can’t find static variable in namespace.
  • SymbolFile/PDB/func-symbols.test (known)
    Argument and return types are not created by the native plugin.
  • SymbolFile/PDB/function-nested-block.test (known)
    In the DIA plugin, the main function is found as ‘name = “main”, mangled = “main”’ but in the native plugin, there’s no mangled name.
  • SymbolFile/PDB/native-setting.cpp (new)
    This fails because there’s no DIA plugin. So the native plugin is always used. Maybe there should be an additional REQUIRES for DIA support?
  • SymbolFile/PDB/pointers.test (known)
    Fails because function types in the native plugin don’t have a name. The DIA plugin created the function types with a name. For example, int foo(int) would create a Type{...} name="foo" compiler_type = ... int (int). The native plugin creates Type{...} compiler_type = ... int (int).
    Furthermore, the native plugin doesn’t include any blocks and their variables.
  • SymbolFile/PDB/type-quals.test (known)
    Same reason as SymbolFile/PDB/func-symbols.test
  • SymbolFile/PDB/typedefs.test (known)
    Fails because the native plugin creates typedefs with the typedef’d type. So typedef char MyChar would show as Type{..} , name = "MyChar", size = 1, compiler_type = char whereas DIA would show ... compiler_type = typedef MyChar. Seems like a simple fix.
  • SymbolFile/PDB/udt-layout.test (known)
    This hits an assertion failure, because it tries to print a static whose type doesn’t have a layout through target variable. It’s a type from the CRT that is forward declared without a complete definition.
  • SymbolFile/PDB/variables.test (known)
    Fails because functions don’t have a mangled name yet (might be more failures after that).
  • Unwind/windows-unaligned-x86_64.test (known)
    The symbols for the assembly functions are not found.

SymbolFile/PDB/class-layout.test and SymbolFile/PDB/enums-layout.test are no longer failing.