Source locations missing when using xray-account

Hi all,

I am trying to add XRay support to the IR produced by GHC. Getting the
basics working is not too complicated after adding the right function
attribute but any analysis shows the function names but not the
locations of the functions.

In particular, I run a program which has been instrumented as follows:

XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic verbosity=1" ./llvm

which then writes out an xray-log file.

==4193==XRay: Log file in 'xray-log.llvm.hgD9oi'
==4193==Cleaned up log for TID: 4193

I then attempt to use the xray-account utility to analyse the log but
whilst the functions are named, the locations of the functions are
displayed as <invalid>:0:0:.

llvm-xray account -sort=sum -sortorder=dsc -instr_map ./llvm xray-log.llvm.hgD9oi

Functions with latencies: 5
   funcid count [ min, med, 90p, 99p,
max] sum function
        6 9 [ 0.000006, 0.000006, 0.000040, 0.000040,
0.000040] 0.000089 <invalid>:0:0: fqux
... 4 lines omitted

If I then inspect the executable with llvm-dwarfdump, I can see that
the debug information for fqux clearly lists the location as well.

0x00000918: DW_TAG_subprogram
                DW_AT_low_pc (0x00000000004207c8)
                DW_AT_high_pc (0x0000000000420817)
                DW_AT_frame_base (DW_OP_reg7 RSP)
                DW_AT_linkage_name ("fqux")
                DW_AT_name ("fqux")
                DW_AT_decl_file ("TODO/llvm.hs")
                DW_AT_decl_line (8)
                DW_AT_external (true)

I suspect this is a problem with the DWARF information as when I try
to use `llvm-symboliser` with address 0x00000000004207c8 as retrieved
from the above paste, the source location is also not reported.

So, can anyone give me some practical advice about how to troubleshoot
this problem/validate the DWARF information that is produced?

It might also be helpful is someone could point me to some
documentation which explains at a high-level how llvm-symboliser
works. Reading the source code was too difficult for me as I am not
very experienced with C++ programming.

Many thanks for your help,

Matt

Is anyone able to help me with this or suggest where I might be able
to get help?

Matt

Is anyone able to help me with this or suggest where I might be able
to get help?

You’ve asked the right place, apologies for the delay I’m supposed to be answering these questions.

Matt

Hi all,

I am trying to add XRay support to the IR produced by GHC. Getting the
basics working is not too complicated after adding the right function
attribute but any analysis shows the function names but not the
locations of the functions.

In particular, I run a program which has been instrumented as follows:

XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic verbosity=1" ./llvm

which then writes out an xray-log file.

==4193==XRay: Log file in 'xray-log.llvm.hgD9oi'
==4193==Cleaned up log for TID: 4193

I then attempt to use the xray-account utility to analyse the log but
whilst the functions are named, the locations of the functions are
displayed as <invalid>:0:0:.

llvm-xray account -sort=sum -sortorder=dsc -instr_map ./llvm xray-log.llvm.hgD9oi

Functions with latencies: 5
  funcid count [ min, med, 90p, 99p,
max] sum function
       6 9 [ 0.000006, 0.000006, 0.000040, 0.000040,
0.000040] 0.000089 <invalid>:0:0: fqux
... 4 lines omitted

If I then inspect the executable with llvm-dwarfdump, I can see that
the debug information for fqux clearly lists the location as well.

0x00000918: DW_TAG_subprogram
               DW_AT_low_pc (0x00000000004207c8)
               DW_AT_high_pc (0x0000000000420817)
               DW_AT_frame_base (DW_OP_reg7 RSP)
               DW_AT_linkage_name ("fqux")
               DW_AT_name ("fqux")
               DW_AT_decl_file ("TODO/llvm.hs")
               DW_AT_decl_line (8)
               DW_AT_external (true)

I suspect this is a problem with the DWARF information as when I try
to use `llvm-symboliser` with address 0x00000000004207c8 as retrieved
from the above paste, the source location is also not reported.

Yes, this is the issue. Getting llvm-symboliser to understand the DWARF information generated will get you better debugging information for XRay.

So, can anyone give me some practical advice about how to troubleshoot
this problem/validate the DWARF information that is produced?

It might also be helpful is someone could point me to some
documentation which explains at a high-level how llvm-symboliser
works. Reading the source code was too difficult for me as I am not
very experienced with C++ programming.

I suspect all the usual resources apply — resources on DWARF4, and potentially looking at the various encodings. I’m not a DWARF expert, but
David Blaikie might be able to help more specifically on emitting useful DWARF information via LLVM.

Many thanks for your help,

Apologies for the delay again.

Cheers

-- Dean

Yeah, not sure what GHC might be doing to generate novel DWARF that llvm-symbolizer can’t consume - if you could create a small example (preferably source I can compile, but also LLVM IR (since it’ll save me having to setup GHC, hopefully) I could feed into clang/llvm and get an object file out that demonstrates the llvm-symbolizer failure) that’d be helpful.

It may also be useful to compare Clang’s behavior on similar C or C++ source to see if there are any critical differences in the DWARF that might explain the difference in behavior.

  • Dave

Hi David,

Sorry for taking a few days to reply. It's not easy for you to compile
a Haskell file to see the problem as the debug information is still
WIP. Below I prove the IR for a simple hello world program which you
can feed into llc.

https://gist.github.com/05296933e37e87533a51d493b46aa48d

The `out.ir` file can be passed straight to `llc`.

Can you see anything obviously wrong?

Matt

Hi David,

Sorry for taking a few days to reply. It’s not easy for you to compile
a Haskell file to see the problem as the debug information is still
WIP. Below I prove the IR for a simple hello world program which you
can feed into llc.

https://gist.github.com/05296933e37e87533a51d493b46aa48d

The [out.ir](http://out.ir) file can be passed straight to llc.

Can you see anything obviously wrong?

Unless I’m missing something else, I believe none of the instructions have !dbg locations attached (see, for example, a simple empty main.cpp compiled to LLVM IR by Clang, and note the !dbg attached to the ret instruction):

$ clang+±tot -emit-llvm -S -g -c -o - main.cpp
; ModuleID = ‘main.cpp’
source_filename = “main.cpp”
target datalayout = “e-m:e-i64:64-f80:128-n8:16:32:64-S128”
target triple = “x86_64-unknown-linux-gnu”

; Function Attrs: noinline norecurse nounwind optnone uwtable
define dso_local i32 @main() #0 !dbg !7 {
entry:
ret i32 0, !dbg !11
}

attributes #0 = { noinline norecurse nounwind optnone uwtable “correctly-rounded-divide-sqrt-fp-math”=“false” “disable-tail-calls”=“false” “less-precise-fpmad”=“false” “min-legal-vector-width”=“0” “no-frame-pointer-elim”=“true” “no-frame-pointer-elim-non-leaf” “no-infs-fp-math”=“false” “no-jump-tables”=“false” “no-nans-fp-math”=“false” “no-signed-zeros-fp-math”=“false” “no-trapping-math”=“false” “stack-protector-buffer-size”=“8” “target-cpu”=“x86-64” “target-features”="+fxsr,+mmx,+sse,+sse2,+x87" “unsafe-fp-math”=“false” “use-soft-float”=“false” }

llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!3, !4, !5}
!llvm.ident = !{!6}

!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: “clang version 8.0.0 (trunk 348415) (llvm/trunk 348416)”, isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, nameTableKind: None)
!1 = !DIFile(filename: “main.cpp”, directory: “/usr/local/google/home/blaikie/dev/scratch/ghc_symbolizer_failure”)
!2 = !{}
!3 = !{i32 2, !“Dwarf Version”, i32 4}
!4 = !{i32 2, !“Debug Info Version”, i32 3}
!5 = !{i32 1, !“wchar_size”, i32 4}
!6 = !{!“clang version 8.0.0 (trunk 348415) (llvm/trunk 348416)”}
!7 = distinct !DISubprogram(name: “main”, scope: !1, file: !1, line: 1, type: !8, scopeLine: 1, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
!8 = !DISubroutineType(types: !9)
!9 = !{!10}
!10 = !DIBasicType(name: “int”, size: 32, encoding: DW_ATE_signed)
!11 = !DILocation(line: 2, column: 1, scope: !7)

David,

Thanks a lot, this fixed the problem with symbol locations. Three more
problems now arise which you might be able to help with.

1. Compiling with `-O1` causes an error in the debug information
verifier: "inlinable function call in a function with debug info must
have a !dbg location"
2. Compiling with `-O2` gives a different verifier error: "!dbg
attachment points at wrong subprogram for function"

Are 1. and 2. bugs in the verifier? It seems that optimisation passes
shouldn't affect the validity of debug information.

3. Compiling with `-O0`, the verifier succeeds and then using
llvm-stack, the complete call hierarchy is completely flat
(an example run for loading into chrome://tracing -
https://gist.githubusercontent.com/mpickering/a6dbaaeaa281b812cf3064a45e73a043/raw/9e5cdfffca1eb1ad269892af6bf7681ff195499e/chrome-tracing.json)

output of llvm-xray stack:
https://gist.github.com/715859a15c088108a8114fcbf68926a5

log.yaml if that helps. https://gist.github.com/86fa7d8252967f6951b63d7a220f2846

Notice that level is 0 for all the entries. Any ideas what could cause this?

Cheers,

Matt

David,

Thanks a lot, this fixed the problem with symbol locations. Three more
problems now arise which you might be able to help with.

  1. Compiling with -O1 causes an error in the debug information
    verifier: “inlinable function call in a function with debug info must
    have a !dbg location”

Yep, any inlinable within a function with debug info must have a debug location.

  1. Compiling with -O2 gives a different verifier error: “!dbg
    attachment points at wrong subprogram for function”

Any !dbg location on an instruction in a function where the function has a subprogram attached through the !dbg must have a scope chain that leads to/ends with the same subprogram.

Are 1. and 2. bugs in the verifier? It seems that optimisation passes
shouldn’t affect the validity of debug information.

Nah, just things the verifier can’t see straight-off (in part it doesn’t assume every function call could be inlined, nor does it walk the whole call graph to discover reachability).

  1. Compiling with -O0, the verifier succeeds and then using
    llvm-stack, the complete call hierarchy is completely flat
    (an example run for loading into chrome://tracing -
    https://gist.githubusercontent.com/mpickering/a6dbaaeaa281b812cf3064a45e73a043/raw/9e5cdfffca1eb1ad269892af6bf7681ff195499e/chrome-tracing.json)

I’m not familiar with llvm-stack & chrome tracing to know what this means, sorry.

output of llvm-xray stack:
https://gist.github.com/715859a15c088108a8114fcbf68926a5

log.yaml if that helps. https://gist.github.com/86fa7d8252967f6951b63d7a220f2846

Notice that level is 0 for all the entries. Any ideas what could cause this?

I’d look at the verifier issues first, sounds like some of your debug info isn’t quite complete in some ways?

This looks like it’s caused by tail-call optimisation, which XRay accurately captures in the trace:

```