RFC: [DebugInfo] Improving Debug Information in LLVM to Recover Optimized-out Function Parameters

Hi,
Following is a proposal to improve location coverage for Function parameters in LLVM. The patches for review will be posted soon.

RFC: [DebugInfo] Improving Debug Information in LLVM to Recover Optimized-out Function Parameters

Ananthakrishna Sowda(Cisco), asowda@cisco.com
Nikola Prica (RT-RK/Cisco), nprica@rtrk.com
Djordje Todorovic(RT-RK/Cisco), djtodorovic@rtrk.com
Ivan Baev (Cisco), ibaev@cisco.com

Overview of the problem
Software release products are compiled with optimization level –O2 and higher. Such products might produce a core-file in case of a failure. Support engineers usually begin debug analysis by looking at the backtrace from a core-file. Unfortunately, many parameters in backtraces are reported as optimized out due to variety of reasons. This makes triaging the issue and assigning ownership harder due to missing information. It is harder for the product team to understand the cause of the failure. In summary, we are describing a well-known serviceability problem for optimized production code.

Proposal for solution
Function parameters have a natural fall-back location which is parent frame. Debuggers can easily go up a frame in call-chain and evaluate any expression. Expert developers can find what values parameters had at function entry point by examining disassembly of caller frame at that particular function call. With additional call-site information produced by compiler, debugger can fully automate this technique. DWARF 5 specification has new tags and attributes to describe call-site parameter information [1][2]. it is already implemented in GCC and GDB since 2011[3]. We propose implementing this feature in LLVM to enhance the debugging of optimized code experience of LLVM users.

Prior mention
An initial version of our work was presented as a poster during LLVM Developer Meeting, in San Jose, 2018. The feature is now fully implemented in internal Clang/LLVM 4.0 version.
We presented a talk on our work at FOSDEM 2019[4].

Implementation notes in Clang and LLVM
On the callee side the only information that we need is whether a parameter is never modified in the function. If true then we can use parameter’s entry value when we lose track of parameter’s location. As a natural way of handling this problem we used Clang’s Sema and its constness check to embed this information in variable’s declaration which is later used for DILocalVariable construction.

For call-site information, new DINode metadata DICallSite and DICallSiteParam are defined and these are emitted by the Clang frontend. The metadata is associated to the call or invoke IR instruction. Here is an example:

%call5 = call i32 @fed_em_strncmp(i8* %arraydecay, i8* %arraydecay1, i64 5), !dbg !114, !call_site !101

!99 = !DICallSiteParam(argno: 1, variable: !91, expr: !DIExpression())
!100 = !DICallSiteParam(argno: 2, variable: !95, expr: !DIExpression())
!101 = !DICallSite(scope: !87, file: !3, parameters: !102, line: 40, calledSubprogram: !13)
!102 = !{!99, !100, !103}
!103 = !DICallSiteParam(argno: 3, expr: !DIExpression(DW_OP_lit5)
For tracking call sites and call site parameters in backend two new pseudo instructions, DBG_CALLSITE and DBG_CALLSITEPARAM, are introduced. See the MIR code bellow:

DBG_CALLSITE 0, %noreg, <!19>; dbg:strncmp.c:40:47
          * DBG_CALLSITEPARAM %RDX, <0x727fee0> = !DIExpression(DW_OP_lit5), 5, %noreg ; dbg:strncmp.c:40:47
          * DBG_CALLSITEPARAM %RSI, "str2" <0x71a6dd0> = !DIExpression(), %RBX, %noreg ; dbg:strncmp.c:40:47
          * DBG_CALLSITEPARAM %RDI, "str1" <0x71a6dd0> = !DIExpression(), %, %RSP, 4 ; dbg:strncmp.c:40:47

There is a challenge in ISel phase to produce them. Algorithm that collects information about call site parameters iterates over call sequence chained nodes returned from target specific call lowering interface. Goal of the algorithm is to recognize SDNodes that represent instructions which will load function arguments into registers that transfer them into another function call frame. There is a question whether this is effectively implemented as a general matching algorithm or it should be lowered to target specific level. DBG_CALLSITE pseudo instruction will need to be revisited since information whether a call is tail call or not could be extracted differently but for sake of simplicity we chose this.
Most of passes handle DBG_CALLSITE and DBG_CALLSITEPARAM through target instruction info interface method isDebugInstr(). This method is used to skip processing of pseudo debug instructions. Since these new pseudo debug instructions relay on virtual registers and frame objects we need to follow up their substitution through the compilation phases. There were several backend passes that needed special attention: Register Coalesce, Inline Spiller, Prologue Epilog Inserter, Split Kit and Virtual Register Rewriter. Virtual Register Rewriter required implementation of target specific salvaging interface for “call site parameter identities” – situation following identity copy instructions that leads to overlapping of parameter transferring register location and location that is loaded into that register.
The last challenge is to extend LiveDebugValues pass to generate additional DBG_VALUE instructions with new kind of debug expression (with ‘DW_OP_entry_value’) for parameters that meet the requirements described in [1].
Finally, emitting call-site debug information can be controlled by ‘-debugger-tune’ and ‘-dwarf-version’ LLVM code-generation flags, since not all debuggers used in the community consume this DWARF information.

Location coverage improvement
The important criteria in debugging-optimized-code is whether the compiler has location information for variables and parameters. We use ‘locstats’ utility from elfutils [5] package to guide us in improving overall location coverage in final executable. For each non-artificial variable or formal parameter - or Debugging Information Entry in DWARF - ‘locstats’ computes what percentage from the code section bytes where the variable is in scope, the variable has a non-empty location description. 100% coverage is not expected for non-global variables and function parameters, since value may not be ‘live’ through the entire scope. On the other end, 0% coverage for variables which are used in the code is indicative of compiler losing track of values.
The second column in Table 1 shows ‘locstats’ report for gdb-7.11 compiled for x86-64 with “-g –O2” by Clang 4.0. For example, there are 29476 parameters whose coverage is in 91..100% range. The third column shows locstats’ report with “-g –O2” and the parameter-entry-value feature. There are now 37671 parameters whose coverage is in 91..100% range – for a 28% improvement.
Because our implementation computes an additional location list entry to parameters whenever possible, and DW_OP_entry_value is valid through the entire scope of the parameter, the numbers at 91..100 row are relevant indication of improvement with parameter-entry-value.

Coverage Parameters Parameters with emit-param-entry-values
(% range) (number/%)) (number/%)
0..10 22682/30% 21342/28%
11..20 3498/4% 2337/3%
21..30 3083/4% 1986/2%
31..40 3050/4% 1862/2%
41..50 2534/3% 1574/2%
51..60 2349/3% 1571/2%
61..70 2184/2% 1649/2%
71..80 2620/3% 2069/2%
81..90 3432/4% 2847/3%
91..100 29476/39% 37671/50%
Table 1 Location coverage statistics for function parasmeters

Improved backtrace for optimized code in debugger
Figure 1 below shows improved backtrace for optimized code when compiled with parameter entry value tracking feature. Please note the new @entry values reported for parameters in backtrace. These parameters will otherwise be reported as <optimized-out>.

gdb) bt
#0 get_next_move_from_list (list=list@entry=0x7fffffffbf88,
color=color@entry=1, moves=moves@entry=0x7fffffffbfb0,
cutoff=cutoff@entry=100) at engine/owl.c:3032
#1 0x000000000042a957 in do_owl_attack (str=<optimized out>,
move=<optimized out>, move@entry=0x7fffffffc334, wormid=<optimized out>, wormid@entry=0x7fffffffc33c, owl=<optimized out>, owl@entry=0x0,
komaster=komaster@entry=0, kom_pos=kom_pos@entry=0,escape=<optimized out>)
at engine/owl.c:1306
#2 0x000000000042a0d0 in owl_attack (target=target@entry=148,
attack_point=attack_point@entry=0x7fffffffc580, certain=<optimized out>, certain@entry=0xb63048 <dragon+11288>, kworm=kworm@entry=0x7fffffffc3c4) at engine/owl.c:1144
#3 0x0000000000412c71 in make_dragons (color=<optimized out>, color@entry=1, stop_before_owl=<optimized out>, stop_before_owl@entry=0, save_verbose=<optimized out>, save_verbose@entry=0) at engine/dragon.c:346
#4 0x0000000000417fdc in examine_position (color=color@entry=1, how_much=how_much@entry=99) at engine/genmove.c:152
#5 0x00000000004183c6 in do_genmove (move=move@entry=0x7fffffffd344, color=1, color@entry=3, pure_threat_value=<optimized out>,
allowed_moves=<optimized out>, allowed_moves@entry=0x0)
at engine/genmove.c:334
#6 0x000000000041926d in genmove_conservative (i=i@entry=0x7fffffffd36c, j=j@entry=0x7fffffffd368, color=3) at engine/genmove.c:255
#7 0x00000000004618ae in gtp_gg_genmove (s=<optimized out>) at interface/play_gtp.c:2163
#8 0x000000000045b0f8 in gtp_main_loop (commands=<optimized out>, gtp_input=0xb8b100) at interface/gtp.c:126
Figure 1: Backtrace with @entry value parameters

Cost in disk image size increase and compile-time
The parameter-entry-value feature is enabled with -g compilation. Due to new DebugInfo metadata generation which adds entries to DWARF sections such as .debug_info and .debug_loc, there is expected size increase of disk image of the executable built with “-g –O”. For SPEC CPU 2006 benchmark, the average size increase is 15%. However, there is no change in sections loaded at runtime such as .text. .data, .bss. Hence, there is no runtime size increase.
Compile-time cost increase is 1-3% percent for SPEC CPU 2006.

Community up-streaming
Since we have implemented this for LLVM-4.0 we are currently in process of porting this implementation on LLVM trunk. We are planning to share this set of patches with LLVM community and seek feedback in improving certain parts of our implementation.

References
[1]Jakub Jelínek and Roland McGrath. DWARF DW_OP_entry_value extension proposal. DWARF Issue? issue=100909.1.
[2 Jakub Jelínek, Roland McGrath, Jan Kratochvíl, and Alexandre Oliva. DWARF DW_TAG_call_site extension proposal. http://dwarfstd.org/ ShowIssue.php?issue=100909.2
[3] J. Jelinek “Improving debug info for optimized away parameters” attachment:jelinek.pdf of summit2010 - GCC Wiki
[4] FOSDEM talk http://bofh.nikhef.nl/events/FOSDEM/2019/K.4.201/llvm_debug.webm
[5] Elfutils The elfutils project

Hi,
Following is a proposal to improve location coverage for Function parameters in LLVM. The patches for review will be posted soon.

RFC: [DebugInfo] Improving Debug Information in LLVM to Recover Optimized-out Function Parameters

Ananthakrishna Sowda(Cisco), asowda@cisco.com
Nikola Prica (RT-RK/Cisco), nprica@rtrk.com
Djordje Todorovic(RT-RK/Cisco), djtodorovic@rtrk.com
Ivan Baev (Cisco), ibaev@cisco.com

Overview of the problem
Software release products are compiled with optimization level –O2 and higher. Such products might produce a core-file in case of a failure. Support engineers usually begin debug analysis by looking at the backtrace from a core-file. Unfortunately, many parameters in backtraces are reported as optimized out due to variety of reasons. This makes triaging the issue and assigning ownership harder due to missing information. It is harder for the product team to understand the cause of the failure. In summary, we are describing a well-known serviceability problem for optimized production code.

Proposal for solution
Function parameters have a natural fall-back location which is parent frame. Debuggers can easily go up a frame in call-chain and evaluate any expression. Expert developers can find what values parameters had at function entry point by examining disassembly of caller frame at that particular function call. With additional call-site information produced by compiler, debugger can fully automate this technique. DWARF 5 specification has new tags and attributes to describe call-site parameter information [1][2]. it is already implemented in GCC and GDB since 2011[3]. We propose implementing this feature in LLVM to enhance the debugging of optimized code experience of LLVM users.

Prior mention
An initial version of our work was presented as a poster during LLVM Developer Meeting, in San Jose, 2018. The feature is now fully implemented in internal Clang/LLVM 4.0 version.
We presented a talk on our work at FOSDEM 2019[4].

Thank you for posting this. This looks very interesting! Since your proposal has a lot of different components, Sema support, DW_AT_call_site_parameter support, DW_OP_entry_value support, it will probably be best to split them out into separate reviews, but it's also good to discuss the proposal in its entirety first. I have a bunch of questions to make sure I fully understand what you are doing.

Implementation notes in Clang and LLVM
On the callee side the only information that we need is whether a parameter is never modified in the function. If true then we can use parameter’s entry value when we lose track of parameter’s location. As a natural way of handling this problem we used Clang’s Sema and its constness check to embed this information in variable’s declaration which is later used for DILocalVariable construction.

By looking at whether an argument is modified in the function, you can identify variables that can be described with an entry value location and that entry value would be valid throughout the function. Are you using this information in the function body to identify whether to emit an entry value location, or are you using this information at the call site to identify call sites for which call site parameters would be beneficial (or both)?

Is emitting an entry value location in the function body an either-or thing or do you also emit plain old locations if you have them available in the location list together with the entry values?

In the function, I assume you don't know whether all call sites will have call site parameters. How do you decide whether to emit entry value locations?

For call-site information, new DINode metadata DICallSite and DICallSiteParam are defined and these are emitted by the Clang frontend. The metadata is associated to the call or invoke IR instruction. Here is an example:

%call5 = call i32 @fed_em_strncmp(i8* %arraydecay, i8* %arraydecay1, i64 5), !dbg !114, !call_site !101

!99 = !DICallSiteParam(argno: 1, variable: !91, expr: !DIExpression())
!100 = !DICallSiteParam(argno: 2, variable: !95, expr: !DIExpression())
!101 = !DICallSite(scope: !87, file: !3, parameters: !102, line: 40, calledSubprogram: !13)
!102 = !{!99, !100, !103}
!103 = !DICallSiteParam(argno: 3, expr: !DIExpression(DW_OP_lit5)
For tracking call sites and call site parameters in backend two new pseudo instructions, DBG_CALLSITE and DBG_CALLSITEPARAM, are introduced. See the MIR code bellow:

DBG_CALLSITE 0, %noreg, <!19>; dbg:strncmp.c:40:47
         * DBG_CALLSITEPARAM %RDX, <0x727fee0> = !DIExpression(DW_OP_lit5), 5, %noreg ; dbg:strncmp.c:40:47
         * DBG_CALLSITEPARAM %RSI, "str2" <0x71a6dd0> = !DIExpression(), %RBX, %noreg ; dbg:strncmp.c:40:47
         * DBG_CALLSITEPARAM %RDI, "str1" <0x71a6dd0> = !DIExpression(), %, %RSP, 4 ; dbg:strncmp.c:40:47

I'll refrain from bike-shedding the actual implementation; let's save this for phabricator, but conceptually, this makes sense to me. If I understand correctly, you are identifying at the call site parameters that are in locations that can be restored by unwinding the function call, such as constants, stack slots, and callee/r-saved registers.
Can you explain why you need to identify them at the IR level? Could you do it just in MIR, too, or is there some information missing in MIR?

What happens to your DICallSiteParam when a function call gets inlined?

There is a challenge in ISel phase to produce them. Algorithm that collects information about call site parameters iterates over call sequence chained nodes returned from target specific call lowering interface. Goal of the algorithm is to recognize SDNodes that represent instructions which will load function arguments into registers that transfer them into another function call frame. There is a question whether this is effectively implemented as a general matching algorithm or it should be lowered to target specific level. DBG_CALLSITE pseudo instruction will need to be revisited since information whether a call is tail call or not could be extracted differently but for sake of simplicity we chose this.
Most of passes handle DBG_CALLSITE and DBG_CALLSITEPARAM through target instruction info interface method isDebugInstr(). This method is used to skip processing of pseudo debug instructions. Since these new pseudo debug instructions relay on virtual registers and frame objects we need to follow up their substitution through the compilation phases. There were several backend passes that needed special attention: Register Coalesce, Inline Spiller, Prologue Epilog Inserter, Split Kit and Virtual Register Rewriter. Virtual Register Rewriter required implementation of target specific salvaging interface for “call site parameter identities” – situation following identity copy instructions that leads to overlapping of parameter transferring register location and location that is loaded into that register.
The last challenge is to extend LiveDebugValues pass to generate additional DBG_VALUE instructions with new kind of debug expression (with ‘DW_OP_entry_value’) for parameters that meet the requirements described in [1].
Finally, emitting call-site debug information can be controlled by ‘-debugger-tune’ and ‘-dwarf-version’ LLVM code-generation flags, since not all debuggers used in the community consume this DWARF information.

Have you considered to instead insert a very late MIR pass that does some backwards analysis on the machine code to yield potential call site parameters instead of threading it all the way through the compiler? If yes, why did you choose this implementation?

Location coverage improvement
The important criteria in debugging-optimized-code is whether the compiler has location information for variables and parameters. We use ‘locstats’ utility from elfutils [5] package to guide us in improving overall location coverage in final executable. For each non-artificial variable or formal parameter - or Debugging Information Entry in DWARF - ‘locstats’ computes what percentage from the code section bytes where the variable is in scope, the variable has a non-empty location description. 100% coverage is not expected for non-global variables and function parameters, since value may not be ‘live’ through the entire scope. On the other end, 0% coverage for variables which are used in the code is indicative of compiler losing track of values.
The second column in Table 1 shows ‘locstats’ report for gdb-7.11 compiled for x86-64 with “-g –O2” by Clang 4.0. For example, there are 29476 parameters whose coverage is in 91..100% range. The third column shows locstats’ report with “-g –O2” and the parameter-entry-value feature. There are now 37671 parameters whose coverage is in 91..100% range – for a 28% improvement.
Because our implementation computes an additional location list entry to parameters whenever possible, and DW_OP_entry_value is valid through the entire scope of the parameter, the numbers at 91..100 row are relevant indication of improvement with parameter-entry-value.

Coverage Parameters Parameters with emit-param-entry-values
(% range) (number/%)) (number/%)
0..10 22682/30% 21342/28%
11..20 3498/4% 2337/3%
21..30 3083/4% 1986/2%
31..40 3050/4% 1862/2%
41..50 2534/3% 1574/2%
51..60 2349/3% 1571/2%
61..70 2184/2% 1649/2%
71..80 2620/3% 2069/2%
81..90 3432/4% 2847/3%
91..100 29476/39% 37671/50%
Table 1 Location coverage statistics for function parasmeters

Improved backtrace for optimized code in debugger
Figure 1 below shows improved backtrace for optimized code when compiled with parameter entry value tracking feature. Please note the new @entry values reported for parameters in backtrace. These parameters will otherwise be reported as <optimized-out>.

gdb) bt
#0 get_next_move_from_list (list=list@entry=0x7fffffffbf88,
color=color@entry=1, moves=moves@entry=0x7fffffffbfb0,
cutoff=cutoff@entry=100) at engine/owl.c:3032
#1 0x000000000042a957 in do_owl_attack (str=<optimized out>,
move=<optimized out>, move@entry=0x7fffffffc334, wormid=<optimized out>, wormid@entry=0x7fffffffc33c, owl=<optimized out>, owl@entry=0x0,
komaster=komaster@entry=0, kom_pos=kom_pos@entry=0,escape=<optimized out>)
at engine/owl.c:1306
#2 0x000000000042a0d0 in owl_attack (target=target@entry=148,
attack_point=attack_point@entry=0x7fffffffc580, certain=<optimized out>, certain@entry=0xb63048 <dragon+11288>, kworm=kworm@entry=0x7fffffffc3c4) at engine/owl.c:1144
#3 0x0000000000412c71 in make_dragons (color=<optimized out>, color@entry=1, stop_before_owl=<optimized out>, stop_before_owl@entry=0, save_verbose=<optimized out>, save_verbose@entry=0) at engine/dragon.c:346
#4 0x0000000000417fdc in examine_position (color=color@entry=1, how_much=how_much@entry=99) at engine/genmove.c:152
#5 0x00000000004183c6 in do_genmove (move=move@entry=0x7fffffffd344, color=1, color@entry=3, pure_threat_value=<optimized out>,
allowed_moves=<optimized out>, allowed_moves@entry=0x0)
at engine/genmove.c:334
#6 0x000000000041926d in genmove_conservative (i=i@entry=0x7fffffffd36c, j=j@entry=0x7fffffffd368, color=3) at engine/genmove.c:255
#7 0x00000000004618ae in gtp_gg_genmove (s=<optimized out>) at interface/play_gtp.c:2163
#8 0x000000000045b0f8 in gtp_main_loop (commands=<optimized out>, gtp_input=0xb8b100) at interface/gtp.c:126
Figure 1: Backtrace with @entry value parameters

Very nice!

Cost in disk image size increase and compile-time
The parameter-entry-value feature is enabled with -g compilation. Due to new DebugInfo metadata generation which adds entries to DWARF sections such as .debug_info and .debug_loc, there is expected size increase of disk image of the executable built with “-g –O”. For SPEC CPU 2006 benchmark, the average size increase is 15%. However, there is no change in sections loaded at runtime such as .text. .data, .bss. Hence, there is no runtime size increase.
Compile-time cost increase is 1-3% percent for SPEC CPU 2006.

At this early stage I'm not yet worried by the size increase. We'll probably find some opportunities to fine-tune the heuristics that decide whether a call site parameter / entry value is profitable. We can also always provide a tuning option to turn the feature off.

Community up-streaming
Since we have implemented this for LLVM-4.0 we are currently in process of porting this implementation on LLVM trunk. We are planning to share this set of patches with LLVM community and seek feedback in improving certain parts of our implementation.

Sounds great!
thanks for sharing this,
adrian

Thank you for your interest and comments! Please see my responses inline.

    >
    > Hi,
    > Following is a proposal to improve location coverage for Function parameters in LLVM. The patches for review will be posted soon.
    >
    > RFC: [DebugInfo] Improving Debug Information in LLVM to Recover Optimized-out Function Parameters
    >
    > Ananthakrishna Sowda(Cisco), asowda@cisco.com
    > Nikola Prica (RT-RK/Cisco), nprica@rtrk.com
    > Djordje Todorovic(RT-RK/Cisco), djtodorovic@rtrk.com
    > Ivan Baev (Cisco), ibaev@cisco.com
    >
    >
    > Overview of the problem
    > Software release products are compiled with optimization level –O2 and higher. Such products might produce a core-file in case of a failure. Support engineers usually begin debug analysis by looking at the backtrace from a core-file. Unfortunately, many parameters in backtraces are reported as optimized out due to variety of reasons. This makes triaging the issue and assigning ownership harder due to missing information. It is harder for the product team to understand the cause of the failure. In summary, we are describing a well-known serviceability problem for optimized production code.
    >
    > Proposal for solution
    > Function parameters have a natural fall-back location which is parent frame. Debuggers can easily go up a frame in call-chain and evaluate any expression. Expert developers can find what values parameters had at function entry point by examining disassembly of caller frame at that particular function call. With additional call-site information produced by compiler, debugger can fully automate this technique. DWARF 5 specification has new tags and attributes to describe call-site parameter information [1][2]. it is already implemented in GCC and GDB since 2011[3]. We propose implementing this feature in LLVM to enhance the debugging of optimized code experience of LLVM users.
    >
    > Prior mention
    > An initial version of our work was presented as a poster during LLVM Developer Meeting, in San Jose, 2018. The feature is now fully implemented in internal Clang/LLVM 4.0 version.
    > We presented a talk on our work at FOSDEM 2019[4].
    >
    
    Thank you for posting this. This looks very interesting! Since your proposal has a lot of different components, Sema support, DW_AT_call_site_parameter support, DW_OP_entry_value support, it will probably be best to split them out into separate reviews, but it's also good to discuss the proposal in its entirety first. I have a bunch of questions to make sure I fully understand what you are doing.

Sure, we will post several patches, each as a logical unit.
    
    > Implementation notes in Clang and LLVM
    > On the callee side the only information that we need is whether a parameter is never modified in the function. If true then we can use parameter’s entry value when we lose track of parameter’s location. As a natural way of handling this problem we used Clang’s Sema and its constness check to embed this information in variable’s declaration which is later used for DILocalVariable construction.
    
    By looking at whether an argument is modified in the function, you can identify variables that can be described with an entry value location and that entry value would be valid throughout the function. Are you using this information in the function body to identify whether to emit an entry value location, or are you using this information at the call site to identify call sites for which call site parameters would be beneficial (or both)?
    
    Is emitting an entry value location in the function body an either-or thing or do you also emit plain old locations if you have them available in the location list together with the entry values?
    
    In the function, I assume you don't know whether all call sites will have call site parameters. How do you decide whether to emit entry value locations?

Entry value location is added to same conventional location list. We emit them when we see holes in the coverage, looking at the whole function scope. It is used by the debugger when there is no conventional location for a program range. It is reported by the debugger as “<optimized-out>, @entry = <value at call site>”.
Unmodified argument is a sub-set for which debugger can report @entry value is same as actual value. So, it will not be reported as optimized-out.
We generate entry value location whenever conventional does not cover one hundred percent. We generate call-site information when the call-site parameter values can be evaluated by unwinding to the parent frame. Only debugger can tell if look-up of an entry value finds matching call site and call site parameter.

    > For call-site information, new DINode metadata DICallSite and DICallSiteParam are defined and these are emitted by the Clang frontend. The metadata is associated to the call or invoke IR instruction. Here is an example:
    >
    > %call5 = call i32 @fed_em_strncmp(i8* %arraydecay, i8* %arraydecay1, i64 5), !dbg !114, !call_site !101
    > …
    > !99 = !DICallSiteParam(argno: 1, variable: !91, expr: !DIExpression())
    > !100 = !DICallSiteParam(argno: 2, variable: !95, expr: !DIExpression())
    > !101 = !DICallSite(scope: !87, file: !3, parameters: !102, line: 40, calledSubprogram: !13)
    > !102 = !{!99, !100, !103}
    > !103 = !DICallSiteParam(argno: 3, expr: !DIExpression(DW_OP_lit5)
    > For tracking call sites and call site parameters in backend two new pseudo instructions, DBG_CALLSITE and DBG_CALLSITEPARAM, are introduced. See the MIR code bellow:
    >
    > DBG_CALLSITE 0, %noreg, <!19>; dbg:strncmp.c:40:47
    > * DBG_CALLSITEPARAM %RDX, <0x727fee0> = !DIExpression(DW_OP_lit5), 5, %noreg ; dbg:strncmp.c:40:47
    > * DBG_CALLSITEPARAM %RSI, "str2" <0x71a6dd0> = !DIExpression(), %RBX, %noreg ; dbg:strncmp.c:40:47
    > * DBG_CALLSITEPARAM %RDI, "str1" <0x71a6dd0> = !DIExpression(), %, %RSP, 4 ; dbg:strncmp.c:40:47
    
    I'll refrain from bike-shedding the actual implementation; let's save this for phabricator, but conceptually, this makes sense to me. If I understand correctly, you are identifying at the call site parameters that are in locations that can be restored by unwinding the function call, such as constants, stack slots, and callee/r-saved registers.
    Can you explain why you need to identify them at the IR level? Could you do it just in MIR, too, or is there some information missing in MIR?
    
    What happens to your DICallSiteParam when a function call gets inlined?
    
Your understanding about call site parameter is right!
When a function call gets inlined, call site information is eliminated.
We kept to the LLVM guidelines of introducing DI metadata in the front end and carrying it through IR and MIR. I don’t think there is missing information in MIR, though.

    > There is a challenge in ISel phase to produce them. Algorithm that collects information about call site parameters iterates over call sequence chained nodes returned from target specific call lowering interface. Goal of the algorithm is to recognize SDNodes that represent instructions which will load function arguments into registers that transfer them into another function call frame. There is a question whether this is effectively implemented as a general matching algorithm or it should be lowered to target specific level. DBG_CALLSITE pseudo instruction will need to be revisited since information whether a call is tail call or not could be extracted differently but for sake of simplicity we chose this.
    > Most of passes handle DBG_CALLSITE and DBG_CALLSITEPARAM through target instruction info interface method isDebugInstr(). This method is used to skip processing of pseudo debug instructions. Since these new pseudo debug instructions relay on virtual registers and frame objects we need to follow up their substitution through the compilation phases. There were several backend passes that needed special attention: Register Coalesce, Inline Spiller, Prologue Epilog Inserter, Split Kit and Virtual Register Rewriter. Virtual Register Rewriter required implementation of target specific salvaging interface for “call site parameter identities” – situation following identity copy instructions that leads to overlapping of parameter transferring register location and location that is loaded into that register.
    > The last challenge is to extend LiveDebugValues pass to generate additional DBG_VALUE instructions with new kind of debug expression (with ‘DW_OP_entry_value’) for parameters that meet the requirements described in [1].
    > Finally, emitting call-site debug information can be controlled by ‘-debugger-tune’ and ‘-dwarf-version’ LLVM code-generation flags, since not all debuggers used in the community consume this DWARF information.
    
    Have you considered to instead insert a very late MIR pass that does some backwards analysis on the machine code to yield potential call site parameters instead of threading it all the way through the compiler? If yes, why did you choose this implementation?

If I have not convinced you on this, we are open for suggestion. I will ask my colleagues on this project to respond too.

    >
    > Location coverage improvement
    > The important criteria in debugging-optimized-code is whether the compiler has location information for variables and parameters. We use ‘locstats’ utility from elfutils [5] package to guide us in improving overall location coverage in final executable. For each non-artificial variable or formal parameter - or Debugging Information Entry in DWARF - ‘locstats’ computes what percentage from the code section bytes where the variable is in scope, the variable has a non-empty location description. 100% coverage is not expected for non-global variables and function parameters, since value may not be ‘live’ through the entire scope. On the other end, 0% coverage for variables which are used in the code is indicative of compiler losing track of values.
    > The second column in Table 1 shows ‘locstats’ report for gdb-7.11 compiled for x86-64 with “-g –O2” by Clang 4.0. For example, there are 29476 parameters whose coverage is in 91..100% range. The third column shows locstats’ report with “-g –O2” and the parameter-entry-value feature. There are now 37671 parameters whose coverage is in 91..100% range – for a 28% improvement.
    > Because our implementation computes an additional location list entry to parameters whenever possible, and DW_OP_entry_value is valid through the entire scope of the parameter, the numbers at 91..100 row are relevant indication of improvement with parameter-entry-value.
    >
    > Coverage Parameters Parameters with emit-param-entry-values
    > (% range) (number/%)) (number/%)
    > 0..10 22682/30% 21342/28%
    > 11..20 3498/4% 2337/3%
    > 21..30 3083/4% 1986/2%
    > 31..40 3050/4% 1862/2%
    > 41..50 2534/3% 1574/2%
    > 51..60 2349/3% 1571/2%
    > 61..70 2184/2% 1649/2%
    > 71..80 2620/3% 2069/2%
    > 81..90 3432/4% 2847/3%
    > 91..100 29476/39% 37671/50%
    > Table 1 Location coverage statistics for function parasmeters
    >
    > Improved backtrace for optimized code in debugger
    > Figure 1 below shows improved backtrace for optimized code when compiled with parameter entry value tracking feature. Please note the new @entry values reported for parameters in backtrace. These parameters will otherwise be reported as <optimized-out>.
    >
    > gdb) bt
    > #0 get_next_move_from_list (list=list@entry=0x7fffffffbf88,
    > color=color@entry=1, moves=moves@entry=0x7fffffffbfb0,
    > cutoff=cutoff@entry=100) at engine/owl.c:3032
    > #1 0x000000000042a957 in do_owl_attack (str=<optimized out>,
    > move=<optimized out>, move@entry=0x7fffffffc334, wormid=<optimized out>, wormid@entry=0x7fffffffc33c, owl=<optimized out>, owl@entry=0x0,
    > komaster=komaster@entry=0, kom_pos=kom_pos@entry=0,escape=<optimized out>)
    > at engine/owl.c:1306
    > #2 0x000000000042a0d0 in owl_attack (target=target@entry=148,
    > attack_point=attack_point@entry=0x7fffffffc580, certain=<optimized out>, certain@entry=0xb63048 <dragon+11288>, kworm=kworm@entry=0x7fffffffc3c4) at engine/owl.c:1144
    > #3 0x0000000000412c71 in make_dragons (color=<optimized out>, color@entry=1, stop_before_owl=<optimized out>, stop_before_owl@entry=0, save_verbose=<optimized out>, save_verbose@entry=0) at engine/dragon.c:346
    > #4 0x0000000000417fdc in examine_position (color=color@entry=1, how_much=how_much@entry=99) at engine/genmove.c:152
    > #5 0x00000000004183c6 in do_genmove (move=move@entry=0x7fffffffd344, color=1, color@entry=3, pure_threat_value=<optimized out>,
    > allowed_moves=<optimized out>, allowed_moves@entry=0x0)
    > at engine/genmove.c:334
    > #6 0x000000000041926d in genmove_conservative (i=i@entry=0x7fffffffd36c, j=j@entry=0x7fffffffd368, color=3) at engine/genmove.c:255
    > #7 0x00000000004618ae in gtp_gg_genmove (s=<optimized out>) at interface/play_gtp.c:2163
    > #8 0x000000000045b0f8 in gtp_main_loop (commands=<optimized out>, gtp_input=0xb8b100) at interface/gtp.c:126
    > Figure 1: Backtrace with @entry value parameters
    
    Very nice!
    
    >
    > Cost in disk image size increase and compile-time
    > The parameter-entry-value feature is enabled with -g compilation. Due to new DebugInfo metadata generation which adds entries to DWARF sections such as .debug_info and .debug_loc, there is expected size increase of disk image of the executable built with “-g –O”. For SPEC CPU 2006 benchmark, the average size increase is 15%. However, there is no change in sections loaded at runtime such as .text. .data, .bss. Hence, there is no runtime size increase.
    > Compile-time cost increase is 1-3% percent for SPEC CPU 2006.
    
    At this early stage I'm not yet worried by the size increase. We'll probably find some opportunities to fine-tune the heuristics that decide whether a call site parameter / entry value is profitable. We can also always provide a tuning option to turn the feature off.
    
    > Community up-streaming
    > Since we have implemented this for LLVM-4.0 we are currently in process of porting this implementation on LLVM trunk. We are planning to share this set of patches with LLVM community and seek feedback in improving certain parts of our implementation.
    
    Sounds great!
    thanks for sharing this,
    adrian
    
Thanks,
Ananth

It's fantastic to see this happening! It'll be really nice to start
seeing some DWARF 5 feature work (as opposed to the infrastructure
stuff, which has its own benefits but doesn't help the end-user's
debugging experience). A couple of questions below.
--paulr

From: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] On Behalf Of
Ananthakrishna Sowda (asowda) via llvm-dev
Sent: Thursday, February 07, 2019 9:32 PM
To: Adrian Prantl
Cc: llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] RFC: [DebugInfo] Improving Debug Information in
LLVM to Recover Optimized-out Function Parameters

Thank you for your interest and comments! Please see my responses inline.

On 2/7/19, 3:17 PM, "aprantl@apple.com on behalf of Adrian Prantl"

    >
    > Hi,
    > Following is a proposal to improve location coverage for Function
parameters in LLVM. The patches for review will be posted soon.
    >
    > RFC: [DebugInfo] Improving Debug Information in LLVM to Recover
Optimized-out Function Parameters
    >
    > Ananthakrishna Sowda(Cisco), asowda@cisco.com
    > Nikola Prica (RT-RK/Cisco), nprica@rtrk.com
    > Djordje Todorovic(RT-RK/Cisco), djtodorovic@rtrk.com
    > Ivan Baev (Cisco), ibaev@cisco.com
    >
    >
    > Overview of the problem
    > Software release products are compiled with optimization level –O2
and higher. Such products might produce a core-file in case of a failure.
Support engineers usually begin debug analysis by looking at the backtrace
from a core-file. Unfortunately, many parameters in backtraces are
reported as optimized out due to variety of reasons. This makes triaging
the issue and assigning ownership harder due to missing information. It is
harder for the product team to understand the cause of the failure. In
summary, we are describing a well-known serviceability problem for
optimized production code.
    >
    > Proposal for solution
    > Function parameters have a natural fall-back location which is
parent frame. Debuggers can easily go up a frame in call-chain and
evaluate any expression. Expert developers can find what values parameters
had at function entry point by examining disassembly of caller frame at
that particular function call. With additional call-site information
produced by compiler, debugger can fully automate this technique. DWARF 5
specification has new tags and attributes to describe call-site parameter
information [1][2]. it is already implemented in GCC and GDB since
2011[3]. We propose implementing this feature in LLVM to enhance the
debugging of optimized code experience of LLVM users.
    >
    > Prior mention
    > An initial version of our work was presented as a poster during LLVM
Developer Meeting, in San Jose, 2018. The feature is now fully implemented
in internal Clang/LLVM 4.0 version.
    > We presented a talk on our work at FOSDEM 2019[4].
    >

    Thank you for posting this. This looks very interesting! Since your
proposal has a lot of different components, Sema support,
DW_AT_call_site_parameter support, DW_OP_entry_value support, it will
probably be best to split them out into separate reviews, but it's also
good to discuss the proposal in its entirety first. I have a bunch of
questions to make sure I fully understand what you are doing.

Sure, we will post several patches, each as a logical unit.

    > Implementation notes in Clang and LLVM
    > On the callee side the only information that we need is whether a
parameter is never modified in the function. If true then we can use
parameter’s entry value when we lose track of parameter’s location. As a
natural way of handling this problem we used Clang’s Sema and its
constness check to embed this information in variable’s declaration which
is later used for DILocalVariable construction.

Do you think we could build on this to offer entry_value location list
entries, if the value is lost (e.g., register is reused) before the parameter
is modified within the callee?

    By looking at whether an argument is modified in the function, you can
identify variables that can be described with an entry value location and
that entry value would be valid throughout the function. Are you using
this information in the function body to identify whether to emit an entry
value location, or are you using this information at the call site to
identify call sites for which call site parameters would be beneficial (or
both)?

    Is emitting an entry value location in the function body an either-or
thing or do you also emit plain old locations if you have them available
in the location list together with the entry values?

    In the function, I assume you don't know whether all call sites will
have call site parameters. How do you decide whether to emit entry value
locations?

Entry value location is added to same conventional location list. We emit
them when we see holes in the coverage, looking at the whole function
scope. It is used by the debugger when there is no conventional location
for a program range. It is reported by the debugger as “<optimized-
>, @entry = <value at call site>”.
Unmodified argument is a sub-set for which debugger can report @entry
value is same as actual value. So, it will not be reported as optimized-
out.
We generate entry value location whenever conventional does not cover one
hundred percent. We generate call-site information when the call-site
parameter values can be evaluated by unwinding to the parent frame. Only
debugger can tell if look-up of an entry value finds matching call site
and call site parameter.

Is the idea to use DW_OP_entry_value in a default-location list entry?
I think we do not emit those currently, but it seems like an ideal match
for an unmodified formal parameter.

Hi,

I'll refrain from bike-shedding the actual implementation; let's save

this
for phabricator, but conceptually, this makes sense to me. If I understand
correctly, you are identifying at the call site parameters that are n
locations that can be restored by unwinding the function call, such as
constants, stack slots, and callee/r-saved registers.

   Can you explain why you need to identify them at the IR level? Could
you do it just in MIR, too, or is there some information missing in MIR?

   What happens to your DICallSiteParam when a function call gets
inlined?

Your understanding about call site parameter is right!
When a function call gets inlined, call site information is eliminated.
We kept to the LLVM guidelines of introducing DI metadata in the front end

and carrying it through IR and >MIR. I don’t think there is missing
information in MIR, though.

In addition to this, we are aware that almost all of these information could
be extracted from MIR. DICallSiteParam provides backup location once
primary location that is loaded into parameter forwarding register is lost,
but with providing DICallSiteParam we also provide ability to look two or
more frames behind in order to search for called function entry values. At
MIR level we are looking for location loaded into parameter forwarding
register, but with DICallSiteParam we are looking for variable’s location
which can be described in terms of DW_OP_entry_value.

  Have you considered to instead insert a very late MIR pass that does
some backwards analysis on the machine code to yield potential call site
parameters instead of threading it all the way through the compiler? If
yes, why did you choose this implementation?

If I have not convinced you on this, we are open for suggestion. I

will ask
my colleagues on this project to respond too.

We considered such approach briefly but we were discouraged by the fact that
we would need to generalize and interpret constant loading instructions,
memory loading, stack object loading and much more instructions for various
architectures.

Thanks,
Nikola & Djordje

It's fantastic to see this happening! It'll be really nice to start
    seeing some DWARF 5 feature work (as opposed to the infrastructure
    stuff, which has its own benefits but doesn't help the end-user's
    debugging experience). A couple of questions below.
    --paulr

Thanks for the comments and feedback. I have added my replies inline.
-Ananth
    
    > From: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] On Behalf Of
    > Ananthakrishna Sowda (asowda) via llvm-dev
    > Sent: Thursday, February 07, 2019 9:32 PM
    > To: Adrian Prantl
    > Cc: llvm-dev@lists.llvm.org
    > Subject: Re: [llvm-dev] RFC: [DebugInfo] Improving Debug Information in
    > LLVM to Recover Optimized-out Function Parameters
    >
    > Thank you for your interest and comments! Please see my responses inline.
    >
    > On 2/7/19, 3:17 PM, "aprantl@apple.com on behalf of Adrian Prantl"
    >
    >
    >
    > >
    > > Hi,
    > > Following is a proposal to improve location coverage for Function
    > parameters in LLVM. The patches for review will be posted soon.
    > >
    > > RFC: [DebugInfo] Improving Debug Information in LLVM to Recover
    > Optimized-out Function Parameters
    > >
    > > Ananthakrishna Sowda(Cisco), asowda@cisco.com
    > > Nikola Prica (RT-RK/Cisco), nprica@rtrk.com
    > > Djordje Todorovic(RT-RK/Cisco), djtodorovic@rtrk.com
    > > Ivan Baev (Cisco), ibaev@cisco.com
    > >
    > >
    > > Overview of the problem
    > > Software release products are compiled with optimization level –O2
    > and higher. Such products might produce a core-file in case of a failure.
    > Support engineers usually begin debug analysis by looking at the backtrace
    > from a core-file. Unfortunately, many parameters in backtraces are
    > reported as optimized out due to variety of reasons. This makes triaging
    > the issue and assigning ownership harder due to missing information. It is
    > harder for the product team to understand the cause of the failure. In
    > summary, we are describing a well-known serviceability problem for
    > optimized production code.
    > >
    > > Proposal for solution
    > > Function parameters have a natural fall-back location which is
    > parent frame. Debuggers can easily go up a frame in call-chain and
    > evaluate any expression. Expert developers can find what values parameters
    > had at function entry point by examining disassembly of caller frame at
    > that particular function call. With additional call-site information
    > produced by compiler, debugger can fully automate this technique. DWARF 5
    > specification has new tags and attributes to describe call-site parameter
    > information [1][2]. it is already implemented in GCC and GDB since
    > 2011[3]. We propose implementing this feature in LLVM to enhance the
    > debugging of optimized code experience of LLVM users.
    > >
    > > Prior mention
    > > An initial version of our work was presented as a poster during LLVM
    > Developer Meeting, in San Jose, 2018. The feature is now fully implemented
    > in internal Clang/LLVM 4.0 version.
    > > We presented a talk on our work at FOSDEM 2019[4].
    > >
    >
    > Thank you for posting this. This looks very interesting! Since your
    > proposal has a lot of different components, Sema support,
    > DW_AT_call_site_parameter support, DW_OP_entry_value support, it will
    > probably be best to split them out into separate reviews, but it's also
    > good to discuss the proposal in its entirety first. I have a bunch of
    > questions to make sure I fully understand what you are doing.
    >
    > Sure, we will post several patches, each as a logical unit.
    >
    > > Implementation notes in Clang and LLVM
    > > On the callee side the only information that we need is whether a
    > parameter is never modified in the function. If true then we can use
    > parameter’s entry value when we lose track of parameter’s location. As a
    > natural way of handling this problem we used Clang’s Sema and its
    > constness check to embed this information in variable’s declaration which
    > is later used for DILocalVariable construction.
    
    Do you think we could build on this to offer entry_value location list
    entries, if the value is lost (e.g., register is reused) before the parameter
    is modified within the callee?
    
We are doing that already in our implementation. Such values are reported with @entry suffix to make it clear. Please look at arguments in frame #3 in the backtrace example in the RFC.

    >
    > By looking at whether an argument is modified in the function, you can
    > identify variables that can be described with an entry value location and
    > that entry value would be valid throughout the function. Are you using
    > this information in the function body to identify whether to emit an entry
    > value location, or are you using this information at the call site to
    > identify call sites for which call site parameters would be beneficial (or
    > both)?
    >
    > Is emitting an entry value location in the function body an either-or
    > thing or do you also emit plain old locations if you have them available
    > in the location list together with the entry values?
    >
    > In the function, I assume you don't know whether all call sites will
    > have call site parameters. How do you decide whether to emit entry value
    > locations?
    >
    > Entry value location is added to same conventional location list. We emit
    > them when we see holes in the coverage, looking at the whole function
    > scope. It is used by the debugger when there is no conventional location
    > for a program range. It is reported by the debugger as “<optimized-
    > >, @entry = <value at call site>”.
    > Unmodified argument is a sub-set for which debugger can report @entry
    > value is same as actual value. So, it will not be reported as optimized-
    > out.
    > We generate entry value location whenever conventional does not cover one
    > hundred percent. We generate call-site information when the call-site
    > parameter values can be evaluated by unwinding to the parent frame. Only
    > debugger can tell if look-up of an entry value finds matching call site
    > and call site parameter.
    
    Is the idea to use DW_OP_entry_value in a default-location list entry?
    I think we do not emit those currently, but it seems like an ideal match
    for an unmodified formal parameter.

Yes, it is the idea. Even for lost values ( reported as optimized-out), showing @entry value has some benefit to the debug user.

Hi,

I am one of the authors of this feature. On Phabricator, we agreed to
take discussion whether encoding this in IR and threading it through the
compiler or performing a late MIR analysis is the better approach.

Regarding late MIR pass approach, it would need to go back from call
instruction by recognizing parameter's forwarding instructions and
interpret them. We could interpret register moves, immediate moves and
some of stack object loadings. There would be need to write
interpretation of various architectures instructions. We were not
positive about completeness of such recognition. However, such analysis
might not be complete as current approach. It would not be able to
produce ‘DW_OP_entry_values’ in ‘DW_TAG_call_site_value’ expression of
call site parameter as a late pass.

As example think of callee function that forwards its argument in
function call in entry block and never uses that argument again:
  %vreg = COPY $rsi; ->
  …. <no use of $rsi nor %vreg> -> …<no use of $rsi nor %vreg>
  $rsi = COPY $vreg; -> call foo
  call foo ->

This is the case from ‘VirtRegMap’ pass, but I think it can happen
elsewhere. Recreation of this might be possible when function ‘foo’ is
in current compilation module, but we are not sure if it is possible for
external modules calls. In order to follow such cases we need
DBG_CALLSITEPARAM that can track such situation.

Since after ISEL phase we have explicit pseudo COPY instructions that
forward argument to another function frame, it came naturally to
recognize such instructions at this stage. There we can say with 100%
certainty that those instruction indeed forward function arguments.

Thanks,

Nikola

[+ some folks more knowledgable about the Machine layer than me.]

Hi,

I am one of the authors of this feature. On Phabricator, we agreed to
take discussion whether encoding this in IR and threading it through the
compiler or performing a late MIR analysis is the better approach.

Regarding late MIR pass approach, it would need to go back from call
instruction by recognizing parameter's forwarding instructions and
interpret them. We could interpret register moves, immediate moves and
some of stack object loadings. There would be need to write
interpretation of various architectures instructions. We were not
positive about completeness of such recognition.

So you're saying that in late MIR, the target-specific MachineInstructions don't have enough generic meta information to understand where data was copied/loaded from in a target-independent way? Would it be easier in earlier pre-regalloc MIR, or does that have the same problem because the instructions are already target-specific?

However, such analysis
might not be complete as current approach. It would not be able to
produce ‘DW_OP_entry_values’ in ‘DW_TAG_call_site_value’ expression of
call site parameter as a late pass.

As example think of callee function that forwards its argument in
function call in entry block and never uses that argument again:
%vreg = COPY $rsi; ->
…. <no use of $rsi nor %vreg> -> …<no use of $rsi nor %vreg>
$rsi = COPY $vreg; -> call foo
call foo ->

I'm not sure I can follow this example (yet). Is this about creating a call site parameter for an argument of the call to foo, or is it meant to illustrate that the call to foo() makes it hard to write a backwards analysis for inserting a call site parameter for a call site below the call to foo()?

This is the case from ‘VirtRegMap’ pass, but I think it can happen
elsewhere.

As I'm not super familiar with the various MIR passes: Is VirtRegMap the pass inserting the vreg copy from $rsi here? Does it do something else?

Recreation of this might be possible when function ‘foo’ is
in current compilation module, but we are not sure if it is possible for
external modules calls. In order to follow such cases we need
DBG_CALLSITEPARAM that can track such situation.

What information exactly would you need about foo's implementation that you cannot get from just knowing the calling convention?

Since after ISEL phase we have explicit pseudo COPY instructions that
forward argument to another function frame, it came naturally to
recognize such instructions at this stage. There we can say with 100%
certainty that those instruction indeed forward function arguments.

My main motivation for this discussion is to minimize the added complexity of the feature. For example, if we figure out that we can get by without introducing new IR constructs (that optimization authors would need to be taught about, that would need to be supported in all three instruction selectors) and can get by with only adding new MIR instructions, that would be a win. However, if we can prove that deferring the analysis to a later stage would result in inferior quality then the extra maintenance burden of new IR constructs may be the right tradeoff.

Thanks for taking the time to walk me through your thought process.
-- adrian

[+ some folks more knowledgable about the Machine layer than me.]

That would be useful for us too! :slight_smile:

Hi,

I am one of the authors of this feature. On Phabricator, we agreed to
take discussion whether encoding this in IR and threading it through the
compiler or performing a late MIR analysis is the better approach.

Regarding late MIR pass approach, it would need to go back from call
instruction by recognizing parameter's forwarding instructions and
interpret them. We could interpret register moves, immediate moves and
some of stack object loadings. There would be need to write
interpretation of various architectures instructions. We were not
positive about completeness of such recognition.

So you're saying that in late MIR, the target-specific MachineInstructions don't have enough generic meta information to understand where data was copied/loaded from in a target-independent way? Would it be easier in earlier pre-regalloc MIR, or does that have the same problem because the instructions are already target-specific?

It has enough generic meta information for some kind of instructions but
not for all. MachineInstr has bits for isMoveImm, isMoveReg and mayLoad
that can be useful for recognizing some kind of parameter loading
instructions. But we're not quite sure whether it is enough for
recognizing all of them. For example there is no support for recognizing
X86::LEA instructions with such mechanism (and there is significant
number of such parameter loading instructions). This instruction made us
give up with late MIR pass approach because we were not sure about
various architectures complexities. As a result from tacking it from
ISEL phase we were able to get entry values from some LEA instructions.
This is very important for this approach. Currently, for various
architectures, this approach gives us more information about values
loaded into parameter forwarding registers than late MIR pass would give
use (because of lack of special instruction interpretation).

But nevertheless, in the end, we lose some information through the
optimization pipeline, and in order to salvage some information we
implemented target specific machine instruction interpreter. For example
situations like:

%vreg = LEA <smt>
$rdi = COPY %vreg
call foo
DBG_CALLSITE
*DBG_CALLSITEPARAM $rdi, , %vreg

==== replace %vreg with $rdi ====

%rdi = LEA <smt>
$rdi = COPY %rdi
call foo
DBG_CALLSITE
*DBG_CALLSITEPARAM $rdi, , %rdi

==== delete redudant instruction identities ====

%rdi = LEA <smt>
call foo
DBG_CALLSITE
*DBG_CALLSITEPARAM $rdi, , %rdi

In order to salvage this we go backward from call instruction and try to
interpret instruction that loads value to $rdi.

This salvaging part could be used for interpreting in late MIR pass but
it would need extension for other architecture specific instructions.
But with current approach that starts tracking parameter forwarding
instructions from ISEL phase we are able to cover some of them without
such interpretor.

ISEL phase is important for matching DICallSiteParam metadata to
respective DBG_CALLSITEPARAM. It also recognizes COPY instructions that
are part of calling sequence but do not forward any argument (for
example for variadic, C calling convention, functions we have copy to AL
register). If we are able to dispatch such non transferring argument
copy instructions from calling convention and we potentially drop IR
metadata about call site parameters, we might be able to do all
necessary parameter's tracking in some separate MIR pass (not very late,
somewhere after ISEL).

However, such analysis
might not be complete as current approach. It would not be able to
produce ‘DW_OP_entry_values’ in ‘DW_TAG_call_site_value’ expression of
call site parameter as a late pass.

As example think of callee function that forwards its argument in
function call in entry block and never uses that argument again:
%vreg = COPY $rsi; ->
…. <no use of $rsi nor %vreg> -> …<no use of $rsi nor %vreg>
$rsi = COPY $vreg; -> call foo
call foo ->

I'm not sure I can follow this example (yet). Is this about creating a call site parameter for an argument of the call to foo, or is it meant to illustrate that the call to foo() makes it hard to write a backwards analysis for inserting a call site parameter for a call site below the call to foo()?
>> This is the case from ‘VirtRegMap’ pass, but I think it can happen

elsewhere.

As I'm not super familiar with the various MIR passes: Is VirtRegMap the pass inserting the vreg copy from $rsi here? Does it do something else?

Oh, I didn't explain it fully. In virtual register rewriter, for
previous example %vreg gets replaced with $rsi and we get two 'identity
copies' ($rsi = COPY $rsi) that get deleted. Such situation is special
for call at entry block whose argument is callee's argument that is used
only at that place. For example like:

baa(int a) {
  foo(a);
  <code that does not use 'a' variable>;
}

Variable 'a' is dead after 'foo' call and there is no interest in
preserving it in further flow of function. At that point we lose
information about parameter forwarding instruction. No instruction, no
way to interpret it. In order to track such situations we need
DBG_CALLSITEPARAM instructions to track parameter transferring register
through the backend. For situations like this we use DICallSiteParam in
order to find it in previous frame. One late pass wont do the trick.

Call at entry block that implicitly re-forwards argument is special
situation and can possibly be handled with emitting DW_OP_entry_value
for call site parameter value expression.

Also, it is worth mentioning that for situations where we could have
call of 'foo(a)' nested at some machine block, parameter loading
instruction can be in different block than the call of 'foo(a)'. Such
situations would not be handled so easily by late pass.

Recreation of this might be possible when function ‘foo’ is
in current compilation module, but we are not sure if it is possible for
external modules calls. In order to follow such cases we need
DBG_CALLSITEPARAM that can track such situation.

What information exactly would you need about foo's implementation that you cannot get from just knowing the calling convention?

Having in mind previous example where we have just call of fucntion foo,
we would need to know how many arguments does 'foo' have and through
which registers are they forwarded. I'm not sure how would we get such
information?

Since after ISEL phase we have explicit pseudo COPY instructions that
forward argument to another function frame, it came naturally to
recognize such instructions at this stage. There we can say with 100%
certainty that those instruction indeed forward function arguments.

My main motivation for this discussion is to minimize the added complexity of the feature. For example, if we figure out that we can get by without introducing new IR constructs (that optimization authors would need to be taught about, that would need to be supported in all three instruction selectors) and can get by with only adding new MIR instructions, that would be a win. However, if we can prove that deferring the analysis to a later stage would result in inferior quality then the extra maintenance burden of new IR constructs may be the right tradeoff.

Thanks for taking the time to walk me through your thought process.

-- adrian

In general we use DICallSiteParam as a backup solution (when we lose
track of location loaded int parameter forwarding register) for
representing value at entry point. As a consequence we are able to
produce 'DW_OP_entry_values' in 'DW_AT_call_site_value'(GCC generates
such expressions) which allow us to go 2 or more frames behind. I've
showed one example for calls at entry block that could also produce such
expressions without DICallSiteParam, but that is the only case that I
can think of now. But since it is a backup when something fails it could
at some point be removed once it is no longer needed as backup.
Currently it gives use more information about call site forwarding values.

Thanks for time! It is our common goal to make this right!
--Nikola

Hi,

[+ Quentin]

Sorry for the late reply.

[+ some folks more knowledgable about the Machine layer than me.]

That would be useful for us too! :slight_smile:

Hi,

I am one of the authors of this feature. On Phabricator, we agreed to
take discussion whether encoding this in IR and threading it through the
compiler or performing a late MIR analysis is the better approach.

Regarding late MIR pass approach, it would need to go back from call
instruction by recognizing parameter’s forwarding instructions and
interpret them. We could interpret register moves, immediate moves and
some of stack object loadings. There would be need to write
interpretation of various architectures instructions. We were not
positive about completeness of such recognition.

So you’re saying that in late MIR, the target-specific MachineInstructions don’t have enough generic meta information to understand where data was copied/loaded from in a target-independent way? Would it be easier in earlier pre-regalloc MIR, or does that have the same problem because the instructions are already target-specific?

It has enough generic meta information for some kind of instructions but
not for all. MachineInstr has bits for isMoveImm, isMoveReg and mayLoad
that can be useful for recognizing some kind of parameter loading
instructions. But we’re not quite sure whether it is enough for
recognizing all of them. For example there is no support for recognizing
X86::LEA instructions with such mechanism (and there is significant
number of such parameter loading instructions). This instruction made us
give up with late MIR pass approach because we were not sure about
various architectures complexities. As a result from tacking it from
ISEL phase we were able to get entry values from some LEA instructions.
This is very important for this approach. Currently, for various
architectures, this approach gives us more information about values
loaded into parameter forwarding registers than late MIR pass would give
use (because of lack of special instruction interpretation).

I discussed this with Jessica offline today, and all this seems correct. We don’t have enough generic information to gather this in MIR. I agree that taking the MIR approach sounds attractive since we don’t have to carry all the debug instructions around, but it seems to me that a lot of these would either need special handling for every instruction that has no metadata (in this case instructions like LEA).

But nevertheless, in the end, we lose some information through the
optimization pipeline, and in order to salvage some information we
implemented target specific machine instruction interpreter. For example
situations like:

%vreg = LEA
$rdi = COPY %vreg
call foo
DBG_CALLSITE
*DBG_CALLSITEPARAM $rdi, , %vreg

==== replace %vreg with $rdi ====

%rdi = LEA
$rdi = COPY %rdi
call foo
DBG_CALLSITE
*DBG_CALLSITEPARAM $rdi, , %rdi

==== delete redudant instruction identities ====

%rdi = LEA
call foo
DBG_CALLSITE
*DBG_CALLSITEPARAM $rdi, , %rdi

In order to salvage this we go backward from call instruction and try to
interpret instruction that loads value to $rdi.

This salvaging part could be used for interpreting in late MIR pass but
it would need extension for other architecture specific instructions.
But with current approach that starts tracking parameter forwarding
instructions from ISEL phase we are able to cover some of them without
such interpretor.

I don’t have an opinion on this, but it sounds like this is going to grow into the full-MIR solution that Adrian was suggesting from the beginning.

ISEL phase is important for matching DICallSiteParam metadata to
respective DBG_CALLSITEPARAM. It also recognizes COPY instructions that
are part of calling sequence but do not forward any argument (for
example for variadic, C calling convention, functions we have copy to AL
register). If we are able to dispatch such non transferring argument
copy instructions from calling convention and we potentially drop IR
metadata about call site parameters, we might be able to do all
necessary parameter’s tracking in some separate MIR pass (not very late,
somewhere after ISEL).

However, such analysis
might not be complete as current approach. It would not be able to
produce ‘DW_OP_entry_values’ in ‘DW_TAG_call_site_value’ expression of
call site parameter as a late pass.

As example think of callee function that forwards its argument in
function call in entry block and never uses that argument again:
%vreg = COPY $rsi; →
…. <no use of $rsi nor %vreg> → …<no use of $rsi nor %vreg>
$rsi = COPY $vreg; → call foo
call foo →

I’m not sure I can follow this example (yet). Is this about creating a call site parameter for an argument of the call to foo, or is it meant to illustrate that the call to foo() makes it hard to write a backwards analysis for inserting a call site parameter for a call site below the call to foo()?

This is the case from ‘VirtRegMap’ pass, but I think it can happen

elsewhere.

As I’m not super familiar with the various MIR passes: Is VirtRegMap the pass inserting the vreg copy from $rsi here? Does it do something else?

Oh, I didn’t explain it fully. In virtual register rewriter, for
previous example %vreg gets replaced with $rsi and we get two ‘identity
copies’ ($rsi = COPY $rsi) that get deleted. Such situation is special
for call at entry block whose argument is callee’s argument that is used
only at that place. For example like:

baa(int a) {
foo(a);
<code that does not use ‘a’ variable>;
}

Variable ‘a’ is dead after ‘foo’ call and there is no interest in
preserving it in further flow of function. At that point we lose
information about parameter forwarding instruction. No instruction, no
way to interpret it. In order to track such situations we need
DBG_CALLSITEPARAM instructions to track parameter transferring register
through the backend. For situations like this we use DICallSiteParam in
order to find it in previous frame. One late pass wont do the trick.

Call at entry block that implicitly re-forwards argument is special
situation and can possibly be handled with emitting DW_OP_entry_value
for call site parameter value expression.

Also, it is worth mentioning that for situations where we could have
call of ‘foo(a)’ nested at some machine block, parameter loading
instruction can be in different block than the call of ‘foo(a)’. Such
situations would not be handled so easily by late pass.

Recreation of this might be possible when function ‘foo’ is
in current compilation module, but we are not sure if it is possible for
external modules calls. In order to follow such cases we need
DBG_CALLSITEPARAM that can track such situation.

What information exactly would you need about foo’s implementation that you cannot get from just knowing the calling convention?

Having in mind previous example where we have just call of fucntion foo,
we would need to know how many arguments does ‘foo’ have and through
which registers are they forwarded. I’m not sure how would we get such
information?

I believe you would have to use the IR Function attached to the MachineFunction and the target calling convention generated from TableGen. This is what targets already implement in [Target]ISelLowering::LowerCall.

Thanks,

Hi,

[+ Quentin]

Sorry for the late reply.

[+ some folks more knowledgable about the Machine layer than me.]

That would be useful for us too! :slight_smile:

Hi,

I am one of the authors of this feature. On Phabricator, we agreed to
take discussion whether encoding this in IR and threading it through the
compiler or performing a late MIR analysis is the better approach.

Regarding late MIR pass approach, it would need to go back from call
instruction by recognizing parameter's forwarding instructions and
interpret them. We could interpret register moves, immediate moves and
some of stack object loadings. There would be need to write
interpretation of various architectures instructions. We were not
positive about completeness of such recognition.

So you're saying that in late MIR, the target-specific MachineInstructions don't have enough generic meta information to understand where data was copied/loaded from in a target-independent way? Would it be easier in earlier pre-regalloc MIR, or does that have the same problem because the instructions are already target-specific?

It has enough generic meta information for some kind of instructions but
not for all. MachineInstr has bits for isMoveImm, isMoveReg and mayLoad
that can be useful for recognizing some kind of parameter loading
instructions. But we're not quite sure whether it is enough for
recognizing all of them. For example there is no support for recognizing
X86::LEA instructions with such mechanism (and there is significant
number of such parameter loading instructions). This instruction made us
give up with late MIR pass approach because we were not sure about
various architectures complexities. As a result from tacking it from
ISEL phase we were able to get entry values from some LEA instructions.
This is very important for this approach. Currently, for various
architectures, this approach gives us more information about values
loaded into parameter forwarding registers than late MIR pass would give
use (because of lack of special instruction interpretation).

I discussed this with Jessica offline today, and all this seems correct. We don’t have enough generic information to gather this in MIR. I agree that taking the MIR approach sounds attractive since we don’t have to carry all the debug instructions around, but it seems to me that a lot of these would either need special handling for every instruction that has no metadata (in this case instructions like LEA).

You seem to imply that this extra metadata is not a long-wanted missing feature of MIR that we've just been waiting for an excuse to implement? Because otherwise it might be the right technical solution to augment MIR to be able to capture the effect lea & friends.

But nevertheless, in the end, we lose some information through the
optimization pipeline, and in order to salvage some information we
implemented target specific machine instruction interpreter. For example
situations like:

%vreg = LEA <smt>
$rdi = COPY %vreg
call foo
DBG_CALLSITE
*DBG_CALLSITEPARAM $rdi, , %vreg

==== replace %vreg with $rdi ====

%rdi = LEA <smt>
$rdi = COPY %rdi
call foo
DBG_CALLSITE
*DBG_CALLSITEPARAM $rdi, , %rdi

==== delete redudant instruction identities ====

%rdi = LEA <smt>
call foo
DBG_CALLSITE
*DBG_CALLSITEPARAM $rdi, , %rdi

In order to salvage this we go backward from call instruction and try to
interpret instruction that loads value to $rdi.

This salvaging part could be used for interpreting in late MIR pass but
it would need extension for other architecture specific instructions.
But with current approach that starts tracking parameter forwarding
instructions from ISEL phase we are able to cover some of them without
such interpretor.

I don’t have an opinion on this, but it sounds like this is going to grow into the full-MIR solution that Adrian was suggesting from the beginning.

Just to clarify the record here, they actually have posted a working implementation that gathers this information at the IR level and threads it all the way through MIR. I've been the one asking whether we couldn't do a very late analysis instead :slight_smile:

My motivation being that teaching all IR and MIR passes about the additional debug metadata has a high maintenance cost and may break more easily as new passes or instruction selectors are added. It's quite possible that the conclusion will be that the approach taken by the patch set is the right trade-off, but I want to make sure that we're rejecting the alternatives for the right technical reasons.

ISEL phase is important for matching DICallSiteParam metadata to
respective DBG_CALLSITEPARAM.

Who is responsible for generating DICallSiteParam annotations? Is it the frontend or is it the codegenprepare(?) pass that knows about the calling convention?

It also recognizes COPY instructions that
are part of calling sequence but do not forward any argument (for
example for variadic, C calling convention, functions we have copy to AL
register). If we are able to dispatch such non transferring argument
copy instructions from calling convention and we potentially drop IR
metadata about call site parameters, we might be able to do all
necessary parameter's tracking in some separate MIR pass (not very late,
somewhere after ISEL).

Since we have three instruction selectors, that sounds intriguing to me :slight_smile:

If everybody agrees that adding DBG_CALLSITEPARAM MIR instructions is preferable over adding metadata to recognize LEA-style instructions, perhaps we can stage the patch reviews in way that we get to a working, but suboptimal implementation that skips the IR call site annotations, and then look at the question whether it's feasible to extract more information from the calling convention lowering. The one wrinkle I see is that the calling convention lowering presumably happens before ISEL, so we may still need to update all three ISEL implementations. That might still be better than also teaching all IR passes about new debug info.

Here's a proposal for how we could proceed:
1. Decide whether to add (a) DBG_CALLSITEPARAM vs. (b) augment MIR to recognize LEA semantics and implement an analysis
2. Land above MIR support for call site parameters
3. if (a), land support for introducing DBG_CALLSITEPARAM either in calling convention lowering or post-ISEL
4. if that isn't good enough discuss whether IR call site parameters are the best solution

let me know if that makes sense.

Thanks,
adrian

Hi all,

As much as possible I would rather we avoid any kind of metadata in MIR to express the semantic of instructions.
Instead I would prefer that each back provides a way to interpret what an instruction is doing. What I have in mind is something that would generalize what we do in the peephole optimizer for instance (look for isRegSequenceLike/getRegSequenceInputs and co.) or what we have for analyzing branches.
One way we could do that and that was discussed in the past would be to describe each instruction in terms of the generic mir operations.

Ultimately we could get a lot of this semantic information automatically populated by TableGen using the ISel patterns, like dagger does (https://github.com/repzret/dagger).

Anyway, for the most part, I believe we could implement the “interpreter” for just a handful of instruction and get 90% of the information right.

Cheers,
-Quentin

Hi all,

As much as possible I would rather we avoid any kind of metadata in MIR
to express the semantic of instructions.
Instead I would prefer that each back provides a way to interpret what
an instruction is doing. What I have in mind is something that would
generalize what we do in the peephole optimizer for instance (look for
isRegSequenceLike/getRegSequenceInputs and co.) or what we have for
analyzing branches.
One way we could do that and that was discussed in the past would be to
describe each instruction in terms of the generic mir operations.

Ultimately we could get a lot of this semantic information automatically
populated by TableGen using the ISel patterns, like dagger does
(https://github.com/repzret/dagger).
> Anyway, for the most part, I believe we could implement the
“interpreter” for just a handful of instruction and get 90% of the
information right.

This seems interesting. We will need to investigate this further if we
decide to take interpreting approach.

Cheers,
-Quentin

Hi,

[+ Quentin]

Sorry for the late reply.

[+ some folks more knowledgable about the Machine layer than me.]

That would be useful for us too! :slight_smile:

Hi,

I am one of the authors of this feature. On Phabricator, we agreed to
take discussion whether encoding this in IR and threading it
through the
compiler or performing a late MIR analysis is the better approach.

Regarding late MIR pass approach, it would need to go back from call
instruction by recognizing parameter's forwarding instructions and
interpret them. We could interpret register moves, immediate moves and
some of stack object loadings. There would be need to write
interpretation of various architectures instructions. We were not
positive about completeness of such recognition.

So you're saying that in late MIR, the target-specific
MachineInstructions don't have enough generic meta information to
understand where data was copied/loaded from in a
target-independent way? Would it be easier in earlier pre-regalloc
MIR, or does that have the same problem because the instructions
are already target-specific?

It has enough generic meta information for some kind of instructions but
not for all. MachineInstr has bits for isMoveImm, isMoveReg and mayLoad
that can be useful for recognizing some kind of parameter loading
instructions. But we're not quite sure whether it is enough for
recognizing all of them. For example there is no support for recognizing
X86::LEA instructions with such mechanism (and there is significant
number of such parameter loading instructions). This instruction made us
give up with late MIR pass approach because we were not sure about
various architectures complexities. As a result from tacking it from
ISEL phase we were able to get entry values from some LEA instructions.
This is very important for this approach. Currently, for various
architectures, this approach gives us more information about values
loaded into parameter forwarding registers than late MIR pass would give
use (because of lack of special instruction interpretation).

I discussed this with Jessica offline today, and all this seems
correct. We don’t have enough generic information to gather this in
MIR. I agree that taking the MIR approach sounds attractive since we
don’t have to carry all the debug instructions around, but it seems
to me that a lot of these would either need special handling for
every instruction that has no metadata (in this case instructions
like LEA).

You seem to imply that this extra metadata is not a long-wanted
missing feature of MIR that we've just been waiting for an excuse to
implement? Because otherwise it might be the right technical solution
to augment MIR to be able to capture the effect lea & friends.

But nevertheless, in the end, we lose some information through the
optimization pipeline, and in order to salvage some information we
implemented target specific machine instruction interpreter. For example
situations like:

%vreg = LEA <smt>
$rdi = COPY %vreg
call foo
DBG_CALLSITE
*DBG_CALLSITEPARAM $rdi, , %vreg

==== replace %vreg with $rdi ====

%rdi = LEA <smt>
$rdi = COPY %rdi
call foo
DBG_CALLSITE
*DBG_CALLSITEPARAM $rdi, , %rdi

==== delete redudant instruction identities ====

%rdi = LEA <smt>
call foo
DBG_CALLSITE
*DBG_CALLSITEPARAM $rdi, , %rdi

In order to salvage this we go backward from call instruction and try to
interpret instruction that loads value to $rdi.

This salvaging part could be used for interpreting in late MIR pass but
it would need extension for other architecture specific instructions.
But with current approach that starts tracking parameter forwarding
instructions from ISEL phase we are able to cover some of them without
such interpretor.

I don’t have an opinion on this, but it sounds like this is going to
grow into the full-MIR solution that Adrian was suggesting from the
beginning.

Just to clarify the record here, they actually have posted a working
implementation that gathers this information at the IR level and
threads it all the way through MIR. I've been the one asking whether
we couldn't do a very late analysis instead :slight_smile:

My motivation being that teaching all IR and MIR passes about the
additional debug metadata has a high maintenance cost and may break
more easily as new passes or instruction selectors are added. It's
quite possible that the conclusion will be that the approach taken by
the patch set is the right trade-off, but I want to make sure that
we're rejecting the alternatives for the right technical reasons.

ISEL phase is important for matching DICallSiteParam metadata to
respective DBG_CALLSITEPARAM.

Who is responsible for generating DICallSiteParam annotations? Is it
the frontend or is it the codegenprepare(?) pass that knows about the
calling convention?

It is fronted job. It basically call expression visitor that produces
it. It basically tries to provide expression that needs to be printed in
caller's frame. For now it is not able to provide function call
expression or expression with more than one variable.

It also recognizes COPY instructions that
are part of calling sequence but do not forward any argument (for
example for variadic, C calling convention, functions we have copy to AL
register). If we are able to dispatch such non transferring argument
copy instructions from calling convention and we potentially drop IR
metadata about call site parameters, we might be able to do all
necessary parameter's tracking in some separate MIR pass (not very late,
somewhere after ISEL).

Since we have three instruction selectors, that sounds intriguing to
me :slight_smile:

Algorithm is implemented only for SelectionDAG but it could be easily
extended for FastISel. It basically works on chained call sequence of
SDNodes. It tries to match copied SDNode to one of the input argument
SDNode. It would certainly work better if this implementation is lowered
to target specific part where call sequence is been generated, but we
have looked for general solution.

If everybody agrees that adding DBG_CALLSITEPARAM MIR instructions is
preferable over adding metadata to recognize LEA-style instructions,
perhaps we can stage the patch reviews in way that we get to a
working, but suboptimal implementation that skips the IR call site
annotations, and then look at the question whether it's feasible to
extract more information from the calling convention lowering. The one
wrinkle I see is that the calling convention lowering presumably
happens before ISEL, so we may still need to update all three ISEL
implementations. That might still be better than also teaching all IR
passes about new debug info.

Just to clarify, IR call site implementation does not give support for
recognizing LEA-style instructions. Tracking locations loaded into
parameter forwarding register from ISEL phase does this by tracking
loaded location. But we agree that it could be best to defer adding of
IR call site annotation for later.

Call lowering happens at ISEL, but calling convention is used before
ISEL to adjust forwarding arguments context? Am I right?

post-ISEL MIR has pretty much generic look that could be processed in
search of copy instructions that forward function arguments. It is still
in SSA form. It looks something like:

   ADJCALLSTACKDOWN64
   %8:gr64 = LEA64r %stack.0.local1, 1, $noreg, 0, $noreg
   %9:gr32 = MOV32ri 10
   %10:gr32 = MOV32ri 15
   $rdi = COPY %8, debug-location !25
   $esi = COPY %1, debug-location !25
   $edx = COPY %9, debug-location !25
   $ecx = COPY %10, debug-location !25
   $r8d = COPY %6, debug-location !25
   $r9d = COPY %7, debug-location !25
   CALL64pcrel32 @foo
   ADJCALLSTACKUP64

At MIR level, we are aware that we could extract calling convention from
foo but we are not still sure how would we recognize COPY instructions
that do not transfer call arguments. There are situations like loading
of AL for functions with variable number of arguments, or like loading
for global based registers for i686 compiled with PIC, etc. Currently we
are trying to investigate how we could recognize such instruction on MIR
level.

Thanks,

Nikola

Hi,

We have done some investigation. Please find my comment inlined bellow.

Hi all,

As much as possible I would rather we avoid any kind of metadata in MIR
to express the semantic of instructions.
Instead I would prefer that each back provides a way to interpret what
an instruction is doing. What I have in mind is something that would
generalize what we do in the peephole optimizer for instance (look for
isRegSequenceLike/getRegSequenceInputs and co.) or what we have for
analyzing branches.
One way we could do that and that was discussed in the past would be to
describe each instruction in terms of the generic mir operations.

Ultimately we could get a lot of this semantic information automatically
populated by TableGen using the ISel patterns, like dagger does
(https://github.com/repzret/dagger).

Anyway, for the most part, I believe we could implement the

“interpreter” for just a handful of instruction and get 90% of the
information right.

This seems interesting. We will need to investigate this further if we
decide to take interpreting approach.

Cheers,
-Quentin

Hi,

[+ Quentin]

Sorry for the late reply.

[+ some folks more knowledgable about the Machine layer than me.]

That would be useful for us too! :slight_smile:

Hi,

I am one of the authors of this feature. On Phabricator, we agreed to
take discussion whether encoding this in IR and threading it
through the
compiler or performing a late MIR analysis is the better approach.

Regarding late MIR pass approach, it would need to go back from call
instruction by recognizing parameter's forwarding instructions and
interpret them. We could interpret register moves, immediate moves and
some of stack object loadings. There would be need to write
interpretation of various architectures instructions. We were not
positive about completeness of such recognition.

So you're saying that in late MIR, the target-specific
MachineInstructions don't have enough generic meta information to
understand where data was copied/loaded from in a
target-independent way? Would it be easier in earlier pre-regalloc
MIR, or does that have the same problem because the instructions
are already target-specific?

It has enough generic meta information for some kind of instructions but
not for all. MachineInstr has bits for isMoveImm, isMoveReg and mayLoad
that can be useful for recognizing some kind of parameter loading
instructions. But we're not quite sure whether it is enough for
recognizing all of them. For example there is no support for recognizing
X86::LEA instructions with such mechanism (and there is significant
number of such parameter loading instructions). This instruction made us
give up with late MIR pass approach because we were not sure about
various architectures complexities. As a result from tacking it from
ISEL phase we were able to get entry values from some LEA instructions.
This is very important for this approach. Currently, for various
architectures, this approach gives us more information about values
loaded into parameter forwarding registers than late MIR pass would give
use (because of lack of special instruction interpretation).

I discussed this with Jessica offline today, and all this seems
correct. We don’t have enough generic information to gather this in
MIR. I agree that taking the MIR approach sounds attractive since we
don’t have to carry all the debug instructions around, but it seems
to me that a lot of these would either need special handling for
every instruction that has no metadata (in this case instructions
like LEA).

You seem to imply that this extra metadata is not a long-wanted
missing feature of MIR that we've just been waiting for an excuse to
implement? Because otherwise it might be the right technical solution
to augment MIR to be able to capture the effect lea & friends.

But nevertheless, in the end, we lose some information through the
optimization pipeline, and in order to salvage some information we
implemented target specific machine instruction interpreter. For example
situations like:

%vreg = LEA <smt>
$rdi = COPY %vreg
call foo
DBG_CALLSITE
*DBG_CALLSITEPARAM $rdi, , %vreg

==== replace %vreg with $rdi ====

%rdi = LEA <smt>
$rdi = COPY %rdi
call foo
DBG_CALLSITE
*DBG_CALLSITEPARAM $rdi, , %rdi

==== delete redudant instruction identities ====

%rdi = LEA <smt>
call foo
DBG_CALLSITE
*DBG_CALLSITEPARAM $rdi, , %rdi

In order to salvage this we go backward from call instruction and try to
interpret instruction that loads value to $rdi.

This salvaging part could be used for interpreting in late MIR pass but
it would need extension for other architecture specific instructions.
But with current approach that starts tracking parameter forwarding
instructions from ISEL phase we are able to cover some of them without
such interpretor.

I don’t have an opinion on this, but it sounds like this is going to
grow into the full-MIR solution that Adrian was suggesting from the
beginning.

Just to clarify the record here, they actually have posted a working
implementation that gathers this information at the IR level and
threads it all the way through MIR. I've been the one asking whether
we couldn't do a very late analysis instead :slight_smile:

My motivation being that teaching all IR and MIR passes about the
additional debug metadata has a high maintenance cost and may break
more easily as new passes or instruction selectors are added. It's
quite possible that the conclusion will be that the approach taken by
the patch set is the right trade-off, but I want to make sure that
we're rejecting the alternatives for the right technical reasons.

ISEL phase is important for matching DICallSiteParam metadata to
respective DBG_CALLSITEPARAM.

Who is responsible for generating DICallSiteParam annotations? Is it
the frontend or is it the codegenprepare(?) pass that knows about the
calling convention?

It is fronted job. It basically call expression visitor that produces
it. It basically tries to provide expression that needs to be printed in
caller's frame. For now it is not able to provide function call
expression or expression with more than one variable.

It also recognizes COPY instructions that
are part of calling sequence but do not forward any argument (for
example for variadic, C calling convention, functions we have copy to AL
register). If we are able to dispatch such non transferring argument
copy instructions from calling convention and we potentially drop IR
metadata about call site parameters, we might be able to do all
necessary parameter's tracking in some separate MIR pass (not very late,
somewhere after ISEL).

Since we have three instruction selectors, that sounds intriguing to
me :slight_smile:

Algorithm is implemented only for SelectionDAG but it could be easily
extended for FastISel. It basically works on chained call sequence of
SDNodes. It tries to match copied SDNode to one of the input argument
SDNode. It would certainly work better if this implementation is lowered
to target specific part where call sequence is been generated, but we
have looked for general solution.

If everybody agrees that adding DBG_CALLSITEPARAM MIR instructions is
preferable over adding metadata to recognize LEA-style instructions,
perhaps we can stage the patch reviews in way that we get to a
working, but suboptimal implementation that skips the IR call site
annotations, and then look at the question whether it's feasible to
extract more information from the calling convention lowering. The one
wrinkle I see is that the calling convention lowering presumably
happens before ISEL, so we may still need to update all three ISEL
implementations. That might still be better than also teaching all IR
passes about new debug info.

Just to clarify, IR call site implementation does not give support for
recognizing LEA-style instructions. Tracking locations loaded into
parameter forwarding register from ISEL phase does this by tracking
loaded location. But we agree that it could be best to defer adding of
IR call site annotation for later.

Call lowering happens at ISEL, but calling convention is used before
ISEL to adjust forwarding arguments context? Am I right?

post-ISEL MIR has pretty much generic look that could be processed in
search of copy instructions that forward function arguments. It is still
in SSA form. It looks something like:

   ADJCALLSTACKDOWN64
   %8:gr64 = LEA64r %stack.0.local1, 1, $noreg, 0, $noreg
   %9:gr32 = MOV32ri 10
   %10:gr32 = MOV32ri 15
   $rdi = COPY %8, debug-location !25
   $esi = COPY %1, debug-location !25
   $edx = COPY %9, debug-location !25
   $ecx = COPY %10, debug-location !25
   $r8d = COPY %6, debug-location !25
   $r9d = COPY %7, debug-location !25
   CALL64pcrel32 @foo
   ADJCALLSTACKUP64

At MIR level, we are aware that we could extract calling convention from
foo but we are not still sure how would we recognize COPY instructions
that do not transfer call arguments. There are situations like loading
of AL for functions with variable number of arguments, or like loading
for global based registers for i686 compiled with PIC, etc. Currently we
are trying to investigate how we could recognize such instruction on MIR
level.

Thanks,

Nikola

Here's a proposal for how we could proceed:
1. Decide whether to add (a) DBG_CALLSITEPARAM vs. (b) augment MIR to
recognize LEA semantics and implement an analysis
2. Land above MIR support for call site parameters
3. if (a), land support for introducing DBG_CALLSITEPARAM either in
calling convention lowering or post-ISEL
4. if that isn't good enough discuss whether IR call site parameters
are the best solution

let me know if that makes sense.

Thanks,
adrian

In order to use calling convention lowering at MIR pass level, for
recognizing instructions that forward function arguments, we would need
to implement calling convention interpreter. Only recognizing
instructions and trying to see whether it is part of calling sequence,
would not be enough. For example, we will not be able to properly handle
cases when one 64bit argument is split on two 32bit registers. This
could be handled if we know number and sizes of arguments of called
function, but then we would end up calling similar process as one from
ISEL phase. We can only know number and sizes of arguments for direct
calls since we can find IR function declaration for it and extract such
information. For indirect calls, we would not be able to perform such
analysis since we cannot fetch function’s declaration. This means that
we will not be able to support indirect calls (not without some trickery).

If everybody agrees with stated, this might be the technical reason to
give up with MIR pass that would collect call site parameter debug info.
If we are wrong with our analysis, please advise us. Otherwise, we can
go with approach with introducing DBG_CALLSITEPARAM and producing it
from ISEL phase (with dispatched IR part).

Thanks,
Nikola

Hi,

We have done some investigation. Please find my comment inlined bellow.

Hi all,

As much as possible I would rather we avoid any kind of metadata in MIR
to express the semantic of instructions.
Instead I would prefer that each back provides a way to interpret what
an instruction is doing. What I have in mind is something that would
generalize what we do in the peephole optimizer for instance (look for
isRegSequenceLike/getRegSequenceInputs and co.) or what we have for
analyzing branches.
One way we could do that and that was discussed in the past would be to
describe each instruction in terms of the generic mir operations.

Ultimately we could get a lot of this semantic information automatically
populated by TableGen using the ISel patterns, like dagger does
(https://github.com/repzret/dagger).

Anyway, for the most part, I believe we could implement the

“interpreter” for just a handful of instruction and get 90% of the
information right.

[...]

Here's a proposal for how we could proceed:
1. Decide whether to add (a) DBG_CALLSITEPARAM vs. (b) augment MIR to
recognize LEA semantics and implement an analysis
2. Land above MIR support for call site parameters
3. if (a), land support for introducing DBG_CALLSITEPARAM either in
calling convention lowering or post-ISEL
4. if that isn't good enough discuss whether IR call site parameters
are the best solution

let me know if that makes sense.

Thanks,
adrian

In order to use calling convention lowering at MIR pass level, for
recognizing instructions that forward function arguments, we would need
to implement calling convention interpreter. Only recognizing
instructions and trying to see whether it is part of calling sequence,
would not be enough. For example, we will not be able to properly handle
cases when one 64bit argument is split on two 32bit registers. This
could be handled if we know number and sizes of arguments of called
function, but then we would end up calling similar process as one from
ISEL phase. We can only know number and sizes of arguments for direct
calls since we can find IR function declaration for it and extract such
information. For indirect calls, we would not be able to perform such
analysis since we cannot fetch function’s declaration. This means that
we will not be able to support indirect calls (not without some trickery).

If everybody agrees with stated, this might be the technical reason to
give up with MIR pass that would collect call site parameter debug info.
If we are wrong with our analysis, please advise us. Otherwise, we can
go with approach with introducing DBG_CALLSITEPARAM and producing it
from ISEL phase (with dispatched IR part).

Thanks,
Nikola

If we want to avoid adding new MIR metadata as Quentin suggests, it sounds like we have really two problems to solve here:

1. At the call site determine which registers / stack slots contain (source-level) function arguments. The really interesting case to handle here is that of a small struct whose elements are passed in registers, or a struct return value.

The information about the callee's function signature is only available at the IR level. If we can match up a call site in MIR with the call site in IR (not sure if that is generally possible) we could introduce new API that returns the calling convention's location for each source-level function argument. Having such an API would be good to have; LLDB for example would really like to know this, too.
That said, I would not want to loose the ability to model indirect function calls. In Swift, for example, indirect function calls are extremely common, as are virtual methods in C++.

2. Backwards-analyze a safe location (caller-saved register, stack slot, constant) for those function arguments.

This is where additional semantic information would be necessary.

Quentin, do you see a way of making (1) work without having the instruction selector lower function argument information into MIR as extra debug info metadata? I'm asking because if we have to add extra debug info metadata to deliver (1) anyway then we might as well use it instead of implementing an analysis for (2).

what do you think?
-- adrian

Hi,

TL;DR I realize my comments are not super helpful and in a nutshell I think we would better define good API for describing how function arguments are lowered than adding new dbg instructions for that so that other tools can benefit from it. Now, I am so far from debug information generation, that I wouldn’t be upset if you choose to just ignore me :).

Hi,

We have done some investigation. Please find my comment inlined bellow.

Hi all,

As much as possible I would rather we avoid any kind of metadata in MIR

to express the semantic of instructions.

Instead I would prefer that each back provides a way to interpret what

an instruction is doing. What I have in mind is something that would

generalize what we do in the peephole optimizer for instance (look for

isRegSequenceLike/getRegSequenceInputs and co.) or what we have for

analyzing branches.

One way we could do that and that was discussed in the past would be to

describe each instruction in terms of the generic mir operations.

Ultimately we could get a lot of this semantic information automatically

populated by TableGen using the ISel patterns, like dagger does

(https://github.com/repzret/dagger).

Anyway, for the most part, I believe we could implement the

“interpreter” for just a handful of instruction and get 90% of the

information right.

[…]

Here’s a proposal for how we could proceed:

  1. Decide whether to add (a) DBG_CALLSITEPARAM vs. (b) augment MIR to

recognize LEA semantics and implement an analysis

  1. Land above MIR support for call site parameters
  1. if (a), land support for introducing DBG_CALLSITEPARAM either in

calling convention lowering or post-ISEL

  1. if that isn’t good enough discuss whether IR call site parameters

are the best solution

let me know if that makes sense.

Thanks,
adrian

In order to use calling convention lowering at MIR pass level, for

recognizing instructions that forward function arguments, we would need

to implement calling convention interpreter. Only recognizing

instructions and trying to see whether it is part of calling sequence,

would not be enough. For example, we will not be able to properly handle

cases when one 64bit argument is split on two 32bit registers. This

could be handled if we know number and sizes of arguments of called

function, but then we would end up calling similar process as one from

ISEL phase. We can only know number and sizes of arguments for direct

calls since we can find IR function declaration for it and extract such

information. For indirect calls, we would not be able to perform such

analysis since we cannot fetch function’s declaration. This means that

we will not be able to support indirect calls (not without some trickery).

Technically, you can guess what is lay down as function parameters by looking at what register are live-ins of your functions, which stack location and so on. That wouldn’t help you with the number and size of the arguments indeed but you know that at compile time, so I don’t know why we would need to explicit those.

Anyway, what I am saying is to me the DBG_CALLSITEPARAM is redundant with what the backend already knows about the function call. The way I see it is this pseudo is a kind of cached information that can be otherwise computed and what worries me is what happens when we do changes that break this “cache”.

If everybody agrees with stated, this might be the technical reason to

give up with MIR pass that would collect call site parameter debug info.

If we are wrong with our analysis, please advise us. Otherwise, we can

go with approach with introducing DBG_CALLSITEPARAM and producing it

from ISEL phase (with dispatched IR part).

Thanks,
Nikola

If we want to avoid adding new MIR metadata as Quentin suggests, it sounds like we have really two problems to solve here:

  1. At the call site determine which registers / stack slots contain (source-level) function arguments. The really interesting case to handle here is that of a small struct whose elements are passed in registers, or a struct return value.

I am really ignorant of how LLVM’s debug information works, but I would have expected we could generate this information directly when we lower the ABI, then refine, in particular until we executed the prologue.
My DWARF is rusty but I would expect we can describe the location of the arguments as registers and CFA at the function entry (fct_symbol+0).
Since this information must be correct at the ABI boundaries, what’s left to describe is what happen next and that I don’t see how we can get away without an interpreter of MIR at this point.

E.g., let say we have:
void foo(int a)

At foo+0: a is in say r0
foo+4: r3 = copy r0

foo+0x30 store r3, fp

In foo, maybe r0 will be optimized out, but at foo+0, a has to be here.
Then you would describe a’s location as being available in r3 from [4, 0x30], then stored at some CFA offsets from (0x30,onward).

I feel the DBG_CALLSITEPARAM stuff only captures the foo+0 location and essentially I don’t see why we need to have it around in MIR. Now, I agree that a pass interpreting how a value is moved around would need to query where the information is at the being of the function but that doesn’t need to be materialized in MIR.

The information about the callee’s function signature is only available at the IR level. If we can match up a call site in MIR with the call site in IR (not sure if that is generally possible) we could introduce new API that returns the calling convention’s location for each source-level function argument. Having such an API would be good to have; LLDB for example would really like to know this, too.

That said, I would not want to loose the ability to model indirect function calls. In Swift, for example, indirect function calls are extremely common, as are virtual methods in C++.

  1. Backwards-analyze a safe location (caller-saved register, stack slot, constant) for those function arguments.

This is where additional semantic information would be necessary.

Quentin, do you see a way of making (1) work without having the instruction selector lower function argument information into MIR as extra debug info metadata?

In theory, yes, I believe we could directly generate it while lowering the ABI. That said, this information is not super useful.
Now, IIRC, the debug info generation all happen at the end, so in that respect we need a way to convey the information down to that pass and there is probably not a way around some information attached somewhere.
Ideally, we don’t have to store this information anywhere, but instead, like you said, we could have proper API that tells you what goes where and you would have your "foo+0” location without carrying extra information around.

Hi Nikola,

This is great, the caller is even simpler to address!

Instead of adding debug metadata, I would rather we make the ABI lowering “queryable”. I.e., something like here is my prototype where are the arguments mapped?

Like Adrian said this kind of API would be beneficial for other tools as well.

Cheers,
Quentin