Representing X86 long double in Debug Info

I’m in the process of teaching the Verifier to validate that the size of debug info variable and the described value matches (already caught a couple of bugs, both in my frontend and in LLVM itself). However, I’ve run into the following:

size of passed value (80) does not match size of declared variable (128)
call void @llvm.dbg.declare(metadata x86_fp80* %x, metadata !11, metadata !13), !dbg !14
%x = alloca x86_fp80, align 16
!11 = !DILocalVariable(name: “x”, scope: !4, file: !1, line: 2, type: !12)
!12 = !DIBasicType(name: “long double”, size: 128, align: 128, encoding: DW_ATE_float)
!13 = !DIExpression()

which happens because llvm knows that fp80s are 80bits, while clang declares them as 128bits in the debug info. We might have to special case this in the verifier, but before we do that, I wanted to ask about the following:

Reading the DWARF standard, it seems like the following would be a valid description of an X86 80bit long double:

DW_TAG_base_type

DW_AT_name “long double”
DW_AT_byte_size 16
DW_AT_bit_size 80

As far as I can tell from looking through the source code, both LLDB and GDB would read this just fine, it would be a more accurate description of a long double and if we add support for it in LLVM IR, the verifier would be able to understand what’s actually going on. Does this seem like a reasonable thing to have LLVM do or would you prefer to just disable this check in the verifier if there are x86_fp80 types around?

Looking at the code in clang CGDebugInfo just passes through the width of the type as it is described by the TypeInfo, which in turn is defined by the Target. At the moment I do not understand why an x86_fp80 is reported to be 128 bits wide. (Since it’s a type natively supported by LLVM http://llvm.org/docs/LangRef.html#floating-point-types I would have expected it to be more like size=80, align=128)

This looks like a bug to me and the debug info should describe an x86_fp80 as being 80 bits wide, but I don’t understand the mechanics well enough to decide whether the TypeInfo is wrong here or if we should work around it in CGDebugInfo.

[Summoning more people who may have a better idea]

– adrian

Using byte_size=16, bit_size=80 (and optional data_bit_offset) would be the right way to go as far as the DWARF spec is concerned.

Persuading TypeInfo to describe something that would generate this is really the question.

–paulr

I think TypeInfo usually describes the answer that sizeof() is supposed to
give. sizeof() is typically a multiple of alignment, so if alignof() is
128, sizeof must be 128. Other common alignments are 32 and 64, which makes
sizeof() 96 and 128 respectively. In practice, sizeof(long double) is never
80.

Maybe CGDebugInfo should ask TargetInfo::getLongDoubleFormat() what model
is in use and generate dwarf accordingly. It's a bit of a hack, so I'm open
to better suggestions.

I’m a bit confused by all this - sizeof(long double) is 16 bytes (128 bits). I /think/ that’s what we should be describing, even if some of that’s essentially padding?

That was essentially part of my question. The DWARF standard says:

If the value of an object of the given type does not fully occupy the storage described by a byte size attribute, the base type entry may also have a DW_AT_bit_size and a DW_AT_data_bit_offset attribute, both of whose values are integer constant values (see Section 2.19). The bit size attribute describes the actual size in bits used to represent values of the given type. The data bit offset attribute is the offset in bits from the beginning of the containing storage to the beginning of the value. Bits that are part of the offset are padding.

which made me think the representation I proposed in the original email might be correct (i.e. an 80bit value, but always stores as 16 bytes).

As far as I see it, there’s 3 questions here:

  1. What’s the right representation in DWARF?
  2. If we think it should be the byte_size/bit_size combination, how do we describe this in IR, because right now, even though the size is in bits, will always emit DW_AT_byte_size $(size>>3)
  3. How would clang describe this in it’s TargetInfo

Just to throw my opinion out there:

  1. Use the DW_AT_bit_size/DW_AT_byte_size
  2. Add a new storage_size attribute to DIBaseType that is generally equal to size, except in cases like this, where we should have storage_size = 128, size = 80
  3. Have clang set those based on getLongDoubeSize (for storage size) and getLongDoubleFormat as Reid suggested. This did seem very hacky too me at first as well, but thinking about it again, the format does encode how many semantic bits there are (because it is needed for correctly constant folding etc.), which is really what we’re asking for here.

However, I personally don’t have strong opinions, as long as there’s something consistent I can implement in the verifier.

To come back to the original question: Doesn’t LLVM know somewhere that x86_fp80 requires 128 bits of storage? (eg: how does it layout the call argument for an x86_fp80 if it’s not being passed in a register? How does it layout the stack containing one of these?)

I /imagine/ somewhere in LLVM knows about the need for 128 bits of storage for these things, and we should be using that information to cross reference with the debug info.

(aside: I don’t think there’s much merit to DWARF consumers to add this extra info - if they’ve all been surviving without it for this long)

80-bit float isn't the only case where the value-size and storage-size of
a type differ. I seem to remember reading on one of these lists, not so
long ago, something from Richard Smith about how the value-size of a bool
is 1 bit while its storage size is 8 bits. I've lost the context though,
sorry…

> 1) What's the right representation in DWARF?

>
> Unfortunately impossible/hard to answer. DWARF is fairly flexible &
doesn't dictate "right" answers, as such.

This time, actually the "right" answer is fairly clear (and in normative
text, no less) right there in section 5.1. Use DW_AT_byte_size for the
storage size and DW_AT_bit_size for the value size.

I'm not sure - it seems like a valid interpretation to believe that the
value is 128 bits - some of those bits are always zero. (& of course the
DWARF spec says "the base type entry /may/ also have", because it's all
permissive & stuff)

it seems like a valid interpretation to believe that the value is 128 bits - some of those bits are always zero

I beg to differ. We had to add hacks in some of our test scripts to detect ‘long double’ and ignore the other unpredictable-and-certainly-not-always-zero bits.

–paulr

Looking into this more, what happens at the LLVM level is that we declare the size to be 80 bits and to find the storage size, we round up that size to the ABI alignment, so I could change the verifier check to do the same, which would solve the immediate issue for me. However, I would still like to figure out the DWARF question, esp. because it will be applicable to i1.

Looking into this more, what happens at the LLVM level is that we declare
the size to be 80 bits and to find the storage size, we round up that size
to the ABI alignment, so I could change the verifier check to do the same,
which would solve the immediate issue for me. However, I would still like
to figure out the DWARF question, esp. because it will be applicable to
`i1`.

What do we represent with i1? (bool?)

Yes, I was thinking of bool. Clang emits it as 8 bits in debug info, but uses i1 in the IR, so e.g. what do I do if an i1 is passed to a dbg.value.

Yes, I was thinking of bool. Clang emits it as 8 bits in debug info, but
uses i1 in the IR, so e.g. what do I do if an i1 is passed to a dbg.value.

i1 even for a function parameter, struct member, etc? Again, I'd be
confused by how that works/how LLVM decides to use a whole byte for it, etc.

The x86-64 and Darwin/i386 ABI define the size of the 80 bits extended type in memory as 16 bytes. In all other i386 ABIs, it's defined as 12 bytes. Delphi and, for compatibility reasons, the Free Pascal Compiler use 10 bytes (although FPC also has a "cextended" type that follows the official ABI for the platform). In FPC we use a [10 x i8] for all memory representations of the non-ABI 80 bits extended type.

So ideally, the bitsize of the type should be specifiable separately from the ABI/TypeInfo, as there may be multiple in the code.

Jonas

I’d like to revive this thread again, because the same problematic has come up again. Originally I solved this problem by asking llvm for the AllocSize (i.e. the size including padding), which solves the fp80 case nicely, but e.g. also gives 32 for {i8, i8, i8}. For now I think the only thing I can do is make the verifier accept either size, but I would like to discuss whether we can represent the difference in IR such that we can tighten the check.

I'd like to revive this thread again, because the same problematic has
come up again. Originally I solved this problem by asking llvm for the
AllocSize (i.e. the size including padding), which solves the fp80 case
nicely, but e.g. also gives 32 for {i8, i8, i8}.

Not sure I follow - why would the allocsize of {i8, i8, i8} be 32? An array
of them would presumably have elements that are 3 bytes, not 4?