Gauging interest in generating PDBs from LLVM-backed languages

Hi all,

I’ve recently been doing some work on a novel language that uses LLVM for optimization and machine codegen. The language is self-hosted, I’m building my third iteration of its garbage collector, and also writing a thin IDE to stretch the language a bit. Needless to say, debugging is a major concern for me.

My primary experience (and primary development focus) is Windows-centric, so my go-to debuggers are Visual Studio and WinDbg. I know of the lldb/VS integration efforts, and of course with appropriate setup I could also leverage gdb (IIRC). But these aren’t quite what I was looking for, personally.

Long story short I set out to build a PDB emitter that could generate debug information for my language based on the existing CodeView emission support as of LLVM 3.8. I’m happy to report success as of this afternoon. More details about the effort and its status can be found at [0].

The high-level overview of my strategy is to crack the CodeView blob from LLVM (.data$S COFF section) and reassemble it, plus some augmentation, then feed that to the API exposed by MSPDB140.dll (Visual Studio 2015’s version). This works and I can debug programs in both VS and WinDbg assuming the front-end supplies sane metadata to the LLVM layer.

My question to the list - is this work valuable for anyone else? Would there be general interest in documentation or even example code that assembles what I’ve learned throughout this effort?

Thanks,

  • Mike

[0] - https://github.com/apoch/epoch-language/wiki/Knowledge-Dump—Debugging-Epoch-Programs

Michael Lewis via llvm-dev <llvm-dev <at> lists.llvm.org> writes:

My question to the list - is this work valuable for anyone else? Would

there be general interest in documentation or even example code that
assembles what I've learned throughout this effort?

I imagine Julia, Rust, and others would appreciate that kind of
resource (and improving the PDB support that's available upstream in
LLVM) for better integration with PDB-based Windows debuggers, so yes!

-Tony

We’ve been pursuing the direction of writing PDBs from scratch in llvm/lib/DebugInfo/PDB/Raw and related directories. It might be interesting to have code that talks to MSPDB140.dll in LLVM, but we really want LLD to be able to produce its output on any platform.

I completely agree about independence from arbitrary platform-specific
binaries. (After all, escaping LINK.EXE just to rely on MSPDB140.dll isn't
that huge of a win by itself.)

I forgot to mention it in my original mail, but my plan is to transition
away from using MSPDB140.dll and fabricate "whole cloth" PDB/MSF files in
the near future. Based on the microsoft-pdb GitHub repo and the code I've
seen in LLVM, I think this is a totally reachable goal.

Most of what I've found has little to do with the mechanics of the MSF
format (yet!) and more to do with the semantic content of PDBs and how to
coax the debuggers in question to yield the desired results. I don't know
how much of that stuff is mysterious to the larger community and what was
just mysterious to me when I set out on this little venture :slight_smile:

But in any case I'd be glad to compare notes with anyone who's working in
this space.

- Mike

I wrote most of the pdb code in llvm so far. As Reid suggested, if you look in DebugInfo/PDB/Raw there is a significant amount of code dealing with msf files and raw pdb streams. If you build the llvm-pdbdump tool you can run it with the “raw” subcommand to dump lots of low level info from the file.

It’s pretty complete for reading pdb files, and I’m actively working on expanding write support.

Feel free to ask questions or even submit a patch if you see something you think could be better

I’ll check into that again. I ran across llvm-pdbdump earlier but couldn’t get it to build on a vanilla 3.8 install (CMake is convinced I don’t have the DIA SDK and I haven’t found a way to change its mind). I stopped short of reading the code though so I wasn’t aware of how much is actually there!

Anyways, I’ll pore over what’s in trunk and see if there’s anything I can contribute. Thanks for the pointer.

  • Mike

Even if you don’t have DIA SDK, llvm-pdbdump will still work with the “raw” subcommand, just not the “pretty” subcommand. That said, I would be interested in finding out why it thinks you don’t have DIA installed. You could do some diagnostics by littering the CMake with some print statements to see if the directory it’s looking for exists or what else the problem might be.

Hi Mike,

I’ll check into that again. I ran across llvm-pdbdump earlier but couldn’t get it to build on a vanilla 3.8 install (CMake is convinced I don’t have the DIA SDK and I haven’t found a way to change its mind).

I have experienced something similar with the DIA SDK.

Most of the time, it is a path problem: https://support.microsoft.com/en-us/kb/3035999

If you open the file cmake/config-ix.cmake you will find the place where cmake check if the DIA_SDK is present (there is also a note saying that sometime this is a Windows bug).

You can modify the line "set(MSVC_DIA_SDK_DIR "$ENV{VSINSTALLDIR}DIA SDK”)” by "set(MSVC_DIA_SDK_DIR “C:\path\to\DIA SDK”)”.

In my case, it was working.

Hope this help.

Greetings,

Johan

Most of the time, it is a path problem:
Fix Visual Studio DIA SDK install location - Visual Studio | Microsoft Docs

If you open the file cmake/config-ix.cmake you will find the place where
cmake check if the DIA_SDK is present (there is also a note saying that
sometime this is a Windows bug).

You can modify the line "set(MSVC_DIA_SDK_DIR "$ENV{VSINSTALLDIR}DIA
SDK”)” by "set(MSVC_DIA_SDK_DIR “C:\\path\\to\\DIA SDK”)”.

Awesome, that did the trick - thanks!

It turns out I was running CMake without the VS Environment vars set, so
%VSINSTALLDIR% was empty. Running with the environment set up correctly
fixed the problem.

Even if you don't have DIA SDK, llvm-pdbdump will still work with the
"raw" subcommand, just not the "pretty" subcommand.

I was running a 3.8 install where this functionality had apparently not yet
landed. I pulled a fresh install from trunk earlier and I'm now playing
around with "raw". So far this looks immensely promising and probably well
beyond my own scope of familiarity with the PDB/MSF formats. I didn't
realize that this much progress had been made since 3.8; serves me right I
guess for not looking at trunk sooner!

I will certainly be willing to speak up if I find anything I know that
isn't represented here already.

- Mike

After continuing to poke around in llvm-pdbdump for a weekend, I feel like
there's more represented in it than I know personally of the file format;
but in any case, I'm (slowly) writing up more notes at [0] to describe how
I'm using the information gleaned to build a usable PDB.

I'm getting sane callstacks and source-level debugging (line numbers) using
the MSPDB140.dll approach, and I have basic high-level raw data emitted as
well. I'll start working on raw-emitting symbols and line numbers over the
next week or so. Once I have feature parity between my raw emitter and the
MSPDB140.dll method, I'll tackle type data generation and see if I can get
locals/function params to be debuggable as well.

This all works in VS2015 and WinDbg, btw. Binaries are 64-bit. Stack unwind
data is generated fine by other means in LLVM, this just enables a debugger
or a library like DbgHelp.dll to attach function names to the stack frames.
Presumably with accurate type data I'll be able to see function params. A
little bit of investigation should net locals as well, although
stack/register mappings back up to code identifiers might be tricky, I
don't know yet.

To slightly change the tack of my original question - it seems like this
might be more useful to the community as a front-end developer's HOWTO
rather than a back-end file-construction kit. Since LLVM trunk already has
considerable support for reading and writing PDBs, maybe the best approach
is to start collecting wisdom on how to actually populate them?

- Mike

[0] -
https://github.com/apoch/epoch-language/wiki/Knowledge-Dump---Debugging-Epoch-Programs

From: "Michael Lewis via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Johan Wehrli" <johan.wehrli@strong.codes>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Monday, August 1, 2016 4:01:57 PM
Subject: Re: [llvm-dev] Gauging interest in generating PDBs from
LLVM-backed languages

> I will certainly be willing to speak up if I find anything I know
> that isn't represented here already.

After continuing to poke around in llvm-pdbdump for a weekend, I feel
like there's more represented in it than I know personally of the
file format; but in any case, I'm (slowly) writing up more notes at
[0] to describe how I'm using the information gleaned to build a
usable PDB.

You might also find this useful: GitHub - microsoft/microsoft-pdb: Information from Microsoft about the PDB format. We'll try to keep this up to date. Just trying to help the CLANG/LLVM community get onto Windows.

-Hal

This afternoon I finished a first pass at "whole cloth" PDB generation. I
can get my entry point function to show up in the debugger with a proper
name in both VS2015 and a couple versions of WinDbg. It isn't much but it's
a start, and now that I have a reasonable handle on how the PDB/MSF formats
are laid out, I can move on to adding in things like source line mappings
and type metadata.

My implementation is entirely in the Epoch language, since my ultimate goal
is to enable debugging of binaries emitted by the self-hosting compiler. I
plan to document the code much better in the near future, but the existing
pass at generating a usable PDB lives here:

The project in that directory is basically a standalone test bed for
emitting a fabricated PDB. Now that the simple parts are in place and
working, I plan to clean it up and generalize a bunch of the hard-coded
magic in there, plus of course add a ton of comments :slight_smile:

After that the plan is to merge the standalone code back into the compiler
itself so that the linking portion of the build tools can emit PDBs for
arbitrary programs.

As before, the living wiki page contains my latest thoughts and knowledge
on the subject:

https://github.com/apoch/epoch-language/wiki/Knowledge-Dump---Debugging-Epoch-Programs

I'm curious now if there are any specific challenges facing other LLVM
languages in this area. I'd be glad to compare notes with anyone else
working on front-end integration of the ability to emit this debug format.