Formatters and downcasts

Endill · November 7, 2023, 7:05pm

I’m writing formatters for Clang, and run into a systematic issue that LLDB doesn’t seem to offer a good solution for. Consider the following simplified example of real Clang code:

struct Type {
  enum TypeClass { Enum };
  TypeClass TC;
  int TypeData; 
};

struct EnumType : Type {
  int EnumData;
};

struct QualType {
  llvm::PointerIntPair<Type *, unsigned, 3> Value;
};

void foo(QualType QT, EnumType *ET) {
  // QT stores a pointer to EnumType
  // breakpoint here
}

If you write some straightforward[1] formatters that doesn’t perform any downcasts, v -T inside foo would print something along the lines of:

(QualType) QT = {
  (llvm::PointerIntPair<Type *, unsigned, 3>) Value = {
    (Type *) SynthesizedPointer = {
      (TypeClass) TC = Enum
      (int) TypeData = 0
    }
    (unsigned) SynthesizedInt = 0
  }
}
(EnumType *) ET = {
  (Type) Type = {
    (TypeClass) TC = Enum
    (int) TypeData = 0
  }
  (int) EnumData = 0
}

I find it unfortunate that less data is extracted from QualType (EnumData is missing). Since the code itself makes perfect sense, I need to address it on the formatter side:

Option 1: add SynthesizedDerived child to Type formatter. This immediately leads to recursion: Type has an EnumType child, which has an implicit base class child Type.

Option 2: option 1 + leveraging SyntheticChildrenGenerated flag in SBValue. The issues that while this flag is not set for a local Type variable, it is set (at least should be) for SBValues created out of thin air by PointerIntPair formatter, including Type pointer. So no way to write a robust formatter on top of this.

Option 3: make PointerIntPair produce a downcasted pointer instead of pointer to base as written in the source. This is doable, but requires special-casing every base. This means (a) coupling between PointerIntPair formatter and formatters for the types passed as the first argument that is not present in the source; (b) one global type search by name (FindFirstType) for so that we have an SBType to cast to, which is in my experience a 1 or 2 orders of magnitude more expensive operation that the usual stuff formatters do. None of this sounds good from responsiveness and maintainability perspective.

Option 4: add a new downcast(SBValue) -> SBValue function to formatters that LLDB calls before settling on the type of the value and picking the formatter based on that. A no-op by default, this would be the place for “let’s take a look at TC and downcast to a proper type if possible” logic. This option is free from concerns raised for option 3.

I don’t see a good solution that is available in LLDB today, and don’t see any obvious flaws in option 4. It would be nice to get feedback on the problem presented, option 4, or other options that haven’t occurred to me. If there is a way skip over implementation of option 4 without compromising maintainability, responsiveness, or other important aspects, I’m all ears.

CC @jingham

[1] PointerIntPair is a bit-packed pointer, so calling a formatter for it that unpacks pointer and integer straightforward is a stretch.

Michael137 · November 8, 2023, 12:18pm

To clarify, what you would like to see is the following?

(QualType) QT = {
  (llvm::PointerIntPair<Type *, unsigned, 3>) Value = {
    (EnumType *) SynthesizedPointer = {
      (Type) Type = {
        (TypeClass) TC = Enum
        (int) TypeData = 0
     }
     (int) EnumData = 0
  }
  (unsigned) SynthesizedInt = 0
}

LLDB is already capable of down-casting whenever we print a base-class pointer. Though I’m not sure how exactly that’s implemented and whether you can make use of that in the formatters.

Can you share the implementation of your formatter? Without seeing exactly how your current output is produced it’s hard to assess what the best way forward is

Endill · November 8, 2023, 6:48pm

That’s correct.

However LLDB does this, it has no way of knowing that TC holds type information, or how TC values are mapped to types.

PointerIntPair is there to highlight that there is a real case when pointer that requires custom downcast might itself be synthesized out of thin air from LLDB perspective. So I hope we can avoid sidetracking this discussion to PointerIntPair formatter implementation. Here is the highest layer of it, that should be sufficient to understand how SBValue for pointer is created:

github.com

Endilll/llvm-project/blob/c6b7cf7ffad2b6c394a9d7f3db15a9688cd94b1b/clang/utils/ClangDataFormat.py#L555-L561


      
          self.raw_pointer_value, self.raw_int_value = self.extract_raw_values(self.value)
          target: lldb.SBTarget = self.value.target
          
          address = SBAddress(self.raw_pointer_value, target)
          pointer_type: SBType = self.value.type.template_args[0]
          self.pointer_value: SBValue = target.CreateValueFromAddress("Pointer", address, pointer_type)
          self.pointer_value.SetSyntheticChildrenGenerated(True)

jingham · November 10, 2023, 12:09am

I’m a little confused that you say you want some downcasting to go on before picking the synthetic child provider? I don’t see how that would help here.

Here’s what should happen in your formatter, as I understand it. You register the synthetic child provider on PointerIntPair<.*>, then when your formatter is called, you figure out where the pointer is and what it’s static type is (in this case Type *).

Once you’ve figured out the address and type of the pointer from your PointerIntPair, you make a ValueObject from the address and the static type. Then when you turn around and ask questions of the VO, lldb will try to compute the “Dynamic Value” for that ValueObject - which is the value downcast to it’s most specific class - and route all questions to that instead of the “Static Value”. That happens automatically provided you haven’t turned off the dynamic typing feature when you make the ValueObject. If that worked in your case, the dynamic value for your pointer VO is exactly what you would want to see, right?

For C++ the only way that lldb knows to calculate “dynamic values” is by looking at the vtable pointer, so we can only do this for virtual types, and the problem here is that your C++ structs are not virtual so we can’t figure out the dynamic type. I bet if you added some virtual functions then we’d print the type you want to see. If not, then you are defeating the dynamic typing in your formatter, which should be easy to fix.

So, since the dynamic value feature has gotten you almost there, and will “just work” for all virtual classes where lldb can compute the dynamic value, it sounds like you want some kind of pluggable “dynamic type” callback that kicks in when lldb can’t figure out the dynamic value of some ValueObject.

I don’t think you want to do this in the formatters specifically. I don’t understand how doing this before formatter matching would help. But moreover, if we did it only in formatters, users could get in the situation where they saw the dynamic value of a pointer if it was held in a structure, but a stand-alone variable with the same pointer value would only show the static type, which would be quite confusing. Extending the dynamic value mechanism directly will solve your problem transparently, and should avoid these sorts of oddities.

So IIUC, you want something like:

(lldb) type dynamic-type-recognizer add -F myModule.MyRecognizer SomeType

Then where lldb goes to fetch the dynamic VO for some value object of type SomeType (or SomeType *, SomeType &) it would pass the VO to the python function myModule.MyRecognizer, and that would return a ValueObject of the dynamic value if it could figure it out, or just return itself if it couldn’t. Then lldb would use whatever that returns.

With that in place, your formatter should then just “do the right thing” wherever it printed values of SomeType, and you wouldn’t have to do anything special in the formatter.

That seems fine to me. I would suggest registering the callbacks based on variable type. If there were another generic method of determining dynamic types that lldb’s missing, then we should add it to lldb’s vtable based algorithm. Otherwise it’s going to be dependent on details of the type, so there’s no point running all the types it won’t be able to understand past it.

Endill · November 10, 2023, 1:22pm

Yes, it’s going to address my use case.

jingham · November 10, 2023, 5:48pm

If you intend to work on this, you should be able to use the same templated “type searcher” code that the summary and synthetic child providers use. For consistency the command & SB API interfaces should also mirror the other type based providers.

Topic		Replies	Views
Traversing member types of a type LLDB	13	505	February 12, 2024
A bytecode for (LLDB) data formatters LLDB	62	1236	November 18, 2024
Problem formatting class types LLDB	6	93	October 27, 2018
LLDB gcc std lib data formatters LLDB	2	117	February 23, 2018
Issues with clang-format of LLVM `StringSwitch` Clang Frontend clang-format	2	128	May 14, 2025

Formatters and downcasts

Related topics