LLDB currently has 2 plugins for PDB parsing. One uses Microsoft’s DIA SDK and the other uses purely LLVM code. This is causing problems for anyone wanting to improve PDB support because they have to decide whether to support and test both or just one.
These people, and their time, are rare and the LLDB project needs to make the most of their contributions.
I propose that we acknowledge that this situation is weird, but will be fixed, and put up with a strange implementation of testing in the period between now and when the 22.x branch is taken.
I am invoking that most dangerous idea, “temporary code”, and that is a risk but I think one we have to take.
On the one hand, our current Windows contributors may run out of steam and this work never happens, on the other, making them juggle 2 plugins forever is a great way to put them off forever.
Questions:
Does anyone have a great reason to keep DIA for longer?
Does anyone object to the short term awkwardness in testing?
Speaking for Linaro who have significant investment in the success of Windows on Arm, we would like tools to be really good, however that is achieved. The native plugin is easier to update, can be used on any platform, it’s the obvious choice.
No, and we are willing to run those extra tests on our existing Windows on Arm bot.
We would like to avoid this being a separate bot because more hardware is more funding to apply for.
We could use one plugin on the Windows on Arm bot and one on the Windows x86_64 bot, but I think the results will be more comprehensible if they’re on the same builder and we won’t miss platform specific issues, as rare as those probably are.
To be clear, anyone endorsing this, or working on PDB right now, is not signing up for this timeline by doing so. It’s purely an example. I’m not anyone’s project manager.
Thanks David for brining this into attention, this issue has been on the radar since the start of LLVM WoA efforts but no one has pushed it enough to make it into a reality.
One thing I am curious about, given that Windows itself is closed-source and DIA is the official Microsoft’s PDB API.. whats the strongest reason not to just standardize on the DIA SDK for PDB parsing and avoid maintaining our own in-tree PDB reader? The end goal is only to support PDB on Windows wouldnt DIA save us the complexity of maintaining our own reader.
The ability to open / inspect PDB files on other platforms than windows is pretty important to us. It’s more about the clang dia support and the llvm-pdbutil etc. But it probably is the same for people using lldb to analyze crashdumps non-natively.
That would be a stark departure from how the rest of the LLVM code works - where we have cross platform native implementations of all code, allowing it to be used anywhere. Being able to interpret PDB files only on Windows, if built with the Microsoft SDK, would be quite limiting.
Additionally, as far as I know, we already have essentially most of the code necessary for having it usable with our own code, it mostly requires bug hunting for reaching feature parity. Throwing that away, including the ability to work identically across platforms, would be a huge waste.
In other words, I’m totally in favour of some short time awkwardness if it helps pushing this forward to get rid of the DIA SDK.
Thanks @mstorsjo and @tobiashieta All the points in favor of native PDB reader, well understood. Only thing we need in addition to testing both PDB readers is testing with both clang-cl and cl. We would like to see that LLDB’s native PDB reader is fully compaitable with PDBs generated by MS cl. This testing probably needs to go beyond the timeline of transition and continue indefinitely in future.
The bigger challenge is getting actual testing from users (especially since for users, the existence of the native plugin is only mentioned in issues). A good way to test user experience are the API tests. There, the goal should be to provide a similar experience to DWARF.
To add to the previous replies: From a developer’s POV, DIA is like a black box. Debugging issues with it is much harder because of that.
The shell tests already support this (the build script can select either clang or cl). I’m not sure how easy it would be to add cl support to API tests, because cl’s options are so different to clang’s.
API tests should work too but will require adding support for cl in the top level makefile that we use for API tests and may also need a bit of tweaking for some of the tests which add custom options through their own makefiles.
FWIW, I think this also is easier to cover properly once we can settle down to one single PDB plugin - otherwise we’d have a testing matrix of (2 PDB producers) x (2 PDB consumers) to cover. If this can be reduced to only 2 configurations, that’s easier to cover than 4 different cases.
True, I think the best we can do is get to testing parity and then hope for feedback from the lldb 22 release. Someone out there probably daily drives lldb from main, but I doubt they do it on Windows.
Yes this is the next thing to aim for.
It’ll happen in bits out of order in reality, but we don’t need to match the DWARF experience to make native PDB the default.
Yes, testing with MS cl is worthwhile but not required before we do any of the plugin work.
We haven’t made any claims about MS cl compatibility yet. We’ve said that DIA might be more compatible, it’s made by Microsoft after all, but I’ve not seen evidence to support that.
Thanks for putting together this RFC and thank you everyone that’s been helping us work towards this. We’ve had discussions spread across various GitHub issues and PRs, so it’s helpful to have a central place to bring everything together and focus on the high-level direction. I’m supportive of the plan and the proposed timeline.
If the goal is to make the switch within the 22 release timeframe, we could be even more aggressive and switch the default now. If we’re not happy with the state of the native parser by the time we branch, we can easily switch back the default on the release/22.x branch.
It signals our intentions of making the native PDB parser the default in the next release.
We get living-on from ToT users, which hopefully motivates them to contribute (maybe I’m being optimistic).
We still get DIA coverage on the Windows bots which will test both.
We can start unifying the tests and have the XFAILs be the source of truth.
The main drawback that I can think of is that it causes more friction (on ToT). That’s by design, to be a forcing factor and a way to get early signal, but I understand if that’s something we want to avoid. I don’t feel strongly about it, I’m throwing it out there as an idea.
We care about LLDB on Windows because of Swift. DWARF is the preferred debug info format for Swift, but the compiler has some PDB support which can help when debugging things that interface with the Windows API. So while it’s not necessarily on the critical path, we still care about the general health and quality of LLDB on Windows. @charles-zablit is driving the Swift on Windows support for LLDB.
No, getting rid of the DIA plugin addresses the concern I expressed in the patch.
MSFT dev, here. I do a lot of work on PDB-related stuff.
I think dropping DIA is the right path. You’ll miss out on a slow trickle of future issues, but my intuition is that it is worth the hassle to move off of a closed-source dependency.
I’ve made a few modest contributions to PDB quality in LLVM code already, and plan on continuing to do so. I would be willing to help with some degree of catching up with DIA, but only for those records that are known to the public to begin with.
Also, DIA is not a good API to use as a model. It is inherently inefficient in many ways. Not just the implementation, but the interface itself. I would encourage LLVM developers to avoid basing anything on DIA’s interface.
I think @Nerixyz is most informed on that. There’s probably some corner cases we could afford to expose to users of the main version. Also, if it was me, I’d want to make sure any large bits of work were done first just to reduce possible distractions.
Great!
I think at first we’re dealing with anything that clang already produces, which is public. For future testing against MS cl, we can direct our questions to Microsoft via. official channels. Linaro has some experience doing this.
Doing it early is definitely better. If we’re going to have pitchforks, it’s better to face them sooner, while there’s still time to do something about it, rather than around release-time crunch-time.
That said, if we do switch to native-pdb ~now, what’s the use case for testing the old plugin? I mean, testing is good, but unless I’m mistaken, until a very short while ago, we had zero PDB test coverage in our API test suite. So it’s not like we would be losing anything compared to the state from last year. One could say that not testing the old pdb plugin is in line with the “stop adding anything new to the DIA plugin” premise this thread started out with…
I’m not familiar with the details, anyone know what the chances are that work on the native plugin makes the DIA plugin worse?
Assuming the lldb → plugin API stays the same, maybe DIA wouldn’t need to change at all.
My suggestion to test both is based on a default assumption that all code is entangled with all other code unless told otherwise. If they are isolated and we retain a method to run the tests manually with the old plugin, that could be acceptable.
Apart from the Initialize function, the two plugins are completely independent. Of course, it’s possible that the in the course of working on one plugin we break the other one, but that’s also true for changes to the DWARF plugin (which sees much more churn). Probably the greatest risk comes from data formatters, which are downstream of the symbol file plugin, and can become dependent on the way a plugin generates type information. However, the data formatters are also fairly new, so I would hope we don’t need to worry too much about their compatibility with the “legacy” plugin.
In short, I’m not going to stand in the way of testing both plugins, but it seems to me like it would be sufficient to test just the new one.
Sure, there are still 12 failures (x86, but a few different ones):
Settings/TestFrameFormatFunctionReturnObjC.test (new) and SymbolFile/DWARF/dwo-static-data-member-access.test (known) breakpoint set -n <name> doesn’t work with the native plugin(?). Might be because AddSymbols isn’t implemented yet ([LLDB][NativePDB] Implement `AddSymbols` by Nerixyz · Pull Request #154121 · llvm/llvm-project · GitHub). Though for the second test, it’s unexpected that this runs with PDB on Windows in the first place.
SymbolFile/PDB/expressions.test (known)
Can’t find static variable in namespace.
SymbolFile/PDB/func-symbols.test (known)
Argument and return types are not created by the native plugin.
SymbolFile/PDB/function-nested-block.test (known)
In the DIA plugin, the main function is found as ‘name = “main”, mangled = “main”’ but in the native plugin, there’s no mangled name.
SymbolFile/PDB/native-setting.cpp (new)
This fails because there’s no DIA plugin. So the native plugin is always used. Maybe there should be an additional REQUIRES for DIA support?
SymbolFile/PDB/pointers.test (known)
Fails because function types in the native plugin don’t have a name. The DIA plugin created the function types with a name. For example, int foo(int) would create a Type{...} name="foo" compiler_type = ... int (int). The native plugin creates Type{...} compiler_type = ... int (int).
Furthermore, the native plugin doesn’t include any blocks and their variables.
SymbolFile/PDB/type-quals.test (known)
Same reason as SymbolFile/PDB/func-symbols.test
SymbolFile/PDB/typedefs.test (known)
Fails because the native plugin creates typedefs with the typedef’d type. So typedef char MyChar would show as Type{..} , name = "MyChar", size = 1, compiler_type = char whereas DIA would show ... compiler_type = typedef MyChar. Seems like a simple fix.
SymbolFile/PDB/udt-layout.test (known)
This hits an assertion failure, because it tries to print a static whose type doesn’t have a layout through target variable. It’s a type from the CRT that is forward declared without a complete definition.
SymbolFile/PDB/variables.test (known)
Fails because functions don’t have a mangled name yet (might be more failures after that).
Unwind/windows-unaligned-x86_64.test (known)
The symbols for the assembly functions are not found.
SymbolFile/PDB/class-layout.test and SymbolFile/PDB/enums-layout.test are no longer failing.