Debug info for pointer-to-array type with dynamic bounds not working


I have locally implemented support for generating debug information for dynamic arrays in the LLVM code generator of the Free Pascal Compiler. A dynamic array in Pascal consist of a pointer to the array data, preceded by a ptruint_t containing the length, and before that a ptrint_t with the reference count (-1 if a constant).

Here is a test program:

{$mode objfpc}

  mydynarray: array of longint;

You can find the generated LLVM IR at target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f3 - . In particular, the debug information for the array:

; The variable
!28 = distinct !DIGlobalVariable(name: "MYDYNARRAY", scope: !5, file: !7, line: 4, type: !30, isDefinition: true, isLocal: true)

; the Longint type
!10 = !DIDerivedType(tag: DW_TAG_typedef, name: "LONGINT", file: !43, line: 25, baseType: !42)
!42 = !DIBasicType(size: 32, encoding: DW_ATE_signed)

; The "array of Longint" type
!49 = distinct !DICompositeType(tag: DW_TAG_array_type, baseType: !10, elements: !46, dataLocation: !50)

; dataLocation: the array is a pointer (not using DW_OP_LLVM_implicit_pointer because the LLVM dwarf writer fatally asserts that it cannot encode it when used in combination with dynamic array bounds
!50 = !DIExpression(DW_OP_push_object_address, DW_OP_deref)

; The dimensions and bounds
!46 = !{!47}
!47 = !DISubrange(upperBound: !48, lowerBound: 0)
!48 = !DIExpression(DW_OP_push_object_address, DW_OP_deref, DW_OP_constu, 8, DW_OP_minus, DW_OP_deref)

This gets compiled by LLVM 15 into this DWARF:

0x0000002f:   DW_TAG_variable
                DW_AT_name      ("MYDYNARRAY")
                DW_AT_type      (0x00000044 "")
                DW_AT_decl_file ("/ssdata/dev/fpcgit/test/tt4.pp")
                DW_AT_decl_line (4)
                DW_AT_location  (DW_OP_addr 0x888)

0x00000044:   DW_TAG_typedef
                DW_AT_type      (0x00000049 "LONGINT[]")

0x00000049:   DW_TAG_array_type
                DW_AT_data_location     (DW_OP_push_object_address, DW_OP_deref)
                DW_AT_type      (0x0000005e "LONGINT")

0x00000051:     DW_TAG_subrange_type
                  DW_AT_type    (0x0000006c "__ARRAY_SIZE_TYPE__")
                  DW_AT_lower_bound     (0)
                  DW_AT_upper_bound     (DW_OP_push_object_address, DW_OP_deref, DW_OP_lit8, DW_OP_minus, DW_OP_deref)

0x0000005d:     NULL

The problem is that while everything works correctly in gdb (8.3.1), in LLDB (even LLDB 15.0), none of the deref operations seem to be used:

(__lldb_invalid_typedef_name) $0 = ([0] = 1810576, [1] = 1, [2] = 0, [3] = 0, [4] = 0, [5] = 0)
(lldb) p &MYDYNARRAY
(__lldb_invalid_typedef_name *) $1 = 0x0000000100064610
(lldb) p &MYDYNARRAY[0]
(LONGINT *) $2 = 0x0000000100064610
(lldb) p MYDYNARRAY[0]
(LONGINT) $3 = 1810576

As you can see, LLDB prints the same address for MYDYNARRAY and MYDYNARRAY[0], and it’s not getting the upperbound from the correct address either.

Changing the generated debug information to claim it’s for a C++ program does not change anything.

Is there something wrong with my debug info that gdb is ignoring, or is this an LLDB bug?

What does !30 have in it?

I don’t know much about lldb, but I’d guess the only dynamic-array kinds of things that are well tested would be C VLAs.

What does GDB print out for the example debugger session? (I’m confused/not sure why &MYDYNARRAY and &MYDYNARRAY[0]` would print different values, if the length is before the address of the variable)

In any case, seems about right to me - but given LLDB’s pretty narrow implementation as mostly a C++/ObjC debugger, not surprising it might have bugs/missing features around novel data types that don’t exist in those languages.

Sorry, that one is (useless, I know, but for implementation reasons I currently always generate an intermediate typedef)

!30 = !DIDerivedType(tag: DW_TAG_typedef, baseType: !49)

You can see the full IR at the pastebin link in my orginal post (the link with the “target datalayout” title – I didn’t specify a title for the pastebin).

Since I saw the dynamic array bounds features were added for flang, I thought they might have been implemented and tested in lldb as well. Especially given the summary of FOSDEM 2022 - Enhanced debuggability support in LLVM for various Fortran language features , although admittedly I didn’t watch the full presentation yet.

You’re correct, I made a mistake there. I confused the behaviour of the array type at the Pascal language level (where taking the address of the array sans subscript gives you the address of the pointer to the array), and what I implemented in the debug information (where the data location of the array indeed points to the first element).

As mentioned above, I thought I was under the impression it was implemented already because of flang. But maybe it’s not been mainlined yet.

Thanks for the replies.