llvm-pdbdump proposal

I came across a PDB that caused llvm-pdbdump to crash. I tracked it down to a missing overload for a data type in the variable printer. When I tried to provide an implementation, I discovered that there were some bugs in the existing overloads and it was easy to come up with combinations of types that weren’t being dumped right. I created some test cases and tried to fix these problems but realized that there’s a limitation to how printing the type information is structured.

The cause of the difficulty is the conflict between these two facts:

  1. C and C++ variable declarations are described inside-out
  2. Printing with color requires printing left-to-right

Simple types generally work, because all the type information is on the left side, so VariableDumper can just work left-to-right. To handle more complex types (like arrays and function signatures), VariableDumper attempts to special case the complex types. Unfortunately, the special cases can’t be combined generally (e.g., making an array of pointers to functions), so it’s always possible to find a new bug by adding an extra layer of type information.

I’m proposing a restructure of VariableDumper (PrettyVariableDumper.cpp) that I think should allow for pretty-printing of all types and actually simplify away most of the special-case code.

The existing approach is to emit the type information for a symbol and then the identifier, using recursion as necessary to emit the type information of subtypes. For example, if foo is a pointer to an int, it’s dumped by first recursively dumping the pointee type (int), then the ‘*’, and then the name (foo). But if bar is a pointer to an array of ints, this breaks down.

My proposal is to extend this slightly by having dumpers for the “left-side” and the “right-side” of the type information. The pattern for printing a variable them becomes:

call dumpLeft with the symbol type
emit the symbol name
call dumpRight with the symbol type

Most of the dumpLeft implementations would be identical to the current dump implementations. The default for a dumpRight implementation would be to emit nothing, but for arrays and function signatures, dumpRight would print the bracketed array size and argument lists. This depth-first order of the recursion should produce the necessary output in order, which means there should be no change to how we stream the output in color.

Any questions, comments, objections?

This is exactly how clang’s type printer works, so I’d say this is probably the right implementation strategy. :slight_smile:

Check out clang/lib/AST/TypePrinter.cpp:

class TypePrinter {

void printBefore(const Type *ty, Qualifiers qs, raw_ostream &OS);
void printBefore(QualType T, raw_ostream &OS);
void printAfter(const Type *ty, Qualifiers qs, raw_ostream &OS);
void printAfter(QualType T, raw_ostream &OS);

#define ABSTRACT_TYPE(CLASS, PARENT)
#define TYPE(CLASS, PARENT)
void print##CLASS##Before(const CLASS##Type *T, raw_ostream &OS);
void print##CLASS##After(const CLASS##Type *T, raw_ostream &OS);
#include “clang/AST/TypeNodes.def”

Since Phabricator isn’t sending email: https://reviews.llvm.org/D31832