llvm bpf debug info. Re: [RFC PATCH v4 3/3] bpf: Introduce function for outputing data to perf event

[SNIP]

I'll post 2 LLVM patches by replying this mail. Please have a look and
help me
send them to LLVM if you think my code is correct.

[SNIP]

patch 2:
do we really need to hack clang?
Can you just define a function that aliases to intrinsic,
like we do for ld_abs/ld_ind ?
void bpf_store_half(void *skb, u64 off, u64 val) asm("llvm.bpf.store.half");
then no extra patches necessary.

Hi Alexei,

By two weeks researching, I have to give you a sad answer that:

target specific intrinsic is not work.

I tried target specific intrinsic. However, LLVM isolates backend and
frontend, and there's no way to pass language level type information
to backend code.

Think about a program like this:

struct strA { int a; }
struct strB { int b; }
int func() {
   struct strA a;
   struct strB b;

   a.a = 1;
   b.b = 2;
   bpf_output(gettype(a), &a);
   bpf_output(gettype(b), &b);
   return 0;
}

BPF backend can't (and needn't) tell the difference between local
variables a and b in theory. In LLVM implementation, it filters type
information out using ComputeValueVTs(). Please have a look at
SelectionDAGBuilder::visitIntrinsicCall in
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp and
SelectionDAGBuilder::visitTargetIntrinsic in the same file. in
visitTargetIntrinsic, ComputeValueVTs acts as a barrier which strips
type information out from CallInst ("I"), and leave SDValue and SDVTList
("Ops" and "VTs") to target code. SDValue and SDVTList are wrappers of
EVT and MVT, all information we concern won't be passed here.

I think now we have 2 choices:

1. Hacking into clang, implement target specific builtin function. Now I
    have worked out a ugly but workable patch which setup a builtin function:
    __builtin_bpf_typeid(), which accepts local or global variable then
    returns different constant for different types.

2. Implementing an LLVM intrinsic call (llvm.typeid), make it be processed in
    visitIntrinsicCall(). I think we can get something useful if it is processed
    with that function.

The next thing should be generating debug information to map type and
constants which issued by __builtin_bpf_typeid() or llvm.typeid. Now we
have a crazy idea that, if we limit the name of the structure to 8 bytes,
we can insert the name into a u64, then there would be no need to consider
type information in DWARF. For example, in the above sample code, gettype(a)
will issue 0x0000000041727473 because its type is "strA". What do you think?

Thank you.

Think about a program like this:

struct strA { int a; }
struct strB { int b; }
int func() {
  struct strA a;
  struct strB b;

  a.a = 1;
  b.b = 2;
  bpf_output(gettype(a), &a);
  bpf_output(gettype(b), &b);
  return 0;
}

BPF backend can't (and needn't) tell the difference between local
variables a and b in theory. In LLVM implementation, it filters type
information out using ComputeValueVTs(). Please have a look at
SelectionDAGBuilder::visitIntrinsicCall in
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp and
SelectionDAGBuilder::visitTargetIntrinsic in the same file. in
visitTargetIntrinsic, ComputeValueVTs acts as a barrier which strips
type information out from CallInst ("I"), and leave SDValue and SDVTList
("Ops" and "VTs") to target code. SDValue and SDVTList are wrappers of
EVT and MVT, all information we concern won't be passed here.

I think now we have 2 choices:

1. Hacking into clang, implement target specific builtin function. Now I
   have worked out a ugly but workable patch which setup a builtin function:
   __builtin_bpf_typeid(), which accepts local or global variable then
   returns different constant for different types.

2. Implementing an LLVM intrinsic call (llvm.typeid), make it be processed
in
   visitIntrinsicCall(). I think we can get something useful if it is
processed
   with that function.

Yeah. You're right about pure target intrinsics.
I think llvm.typeid might work. imo it's cleaner than
doing it at clang level.

The next thing should be generating debug information to map type and
constants which issued by __builtin_bpf_typeid() or llvm.typeid. Now we
have a crazy idea that, if we limit the name of the structure to 8 bytes,
we can insert the name into a u64, then there would be no need to consider
type information in DWARF. For example, in the above sample code, gettype(a)
will issue 0x0000000041727473 because its type is "strA". What do you think?

that's way too hacky.
I was thinking when compiling we can keep llvm ir along with .o
instead of dwarf and extract type info from there.
dwarf has names and other things that we don't need. We only
care about actual field layout of the structs.
But it probably won't be easy to parse llvm ir on perf side
instead of dwarf.

btw, if you haven't looked at iovisor/bcc, there we're solving
similar problem differently. There we use clang rewriter, so all
structs fields are visible at this level, then we use bpf backend
in JIT mode and push bpf instructions into the kernel on the fly
completely skipping ELF and .o
For example in:
https://github.com/iovisor/bcc/blob/master/examples/distributed_bridge/tunnel.c
when you see
struct ethernet_t {
  unsigned long long dst:48;
  unsigned long long src:48;
  unsigned int type:16;
} BPF_PACKET_HEADER;
struct ethernet_t *ethernet = cursor_advance(cursor, sizeof(*ethernet));
... ethernet->src ...
is recognized by clang rewriter and ->src is converted to a different
C code that is sent again into clang.
So there is no need to use dwarf or patch clang/llvm. clang rewriter
has all the info.
I'm not sure you can live with clang/llvm on the host where you
want to run the tracing bits, but if you can that's an easier option.

Think about a program like this:

struct strA { int a; }
struct strB { int b; }
int func() {
   struct strA a;
   struct strB b;

   a.a = 1;
   b.b = 2;
   bpf_output(gettype(a), &a);
   bpf_output(gettype(b), &b);
   return 0;
}

BPF backend can't (and needn't) tell the difference between local
variables a and b in theory. In LLVM implementation, it filters type
information out using ComputeValueVTs(). Please have a look at
SelectionDAGBuilder::visitIntrinsicCall in
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp and
SelectionDAGBuilder::visitTargetIntrinsic in the same file. in
visitTargetIntrinsic, ComputeValueVTs acts as a barrier which strips
type information out from CallInst ("I"), and leave SDValue and SDVTList
("Ops" and "VTs") to target code. SDValue and SDVTList are wrappers of
EVT and MVT, all information we concern won't be passed here.

I think now we have 2 choices:

1. Hacking into clang, implement target specific builtin function. Now I
    have worked out a ugly but workable patch which setup a builtin function:
    __builtin_bpf_typeid(), which accepts local or global variable then
    returns different constant for different types.

2. Implementing an LLVM intrinsic call (llvm.typeid), make it be processed
in
    visitIntrinsicCall(). I think we can get something useful if it is
processed
    with that function.

Yeah. You're right about pure target intrinsics.
I think llvm.typeid might work. imo it's cleaner than
doing it at clang level.

The next thing should be generating debug information to map type and
constants which issued by __builtin_bpf_typeid() or llvm.typeid. Now we
have a crazy idea that, if we limit the name of the structure to 8 bytes,
we can insert the name into a u64, then there would be no need to consider
type information in DWARF. For example, in the above sample code, gettype(a)
will issue 0x0000000041727473 because its type is "strA". What do you think?

that's way too hacky.
I was thinking when compiling we can keep llvm ir along with .o
instead of dwarf and extract type info from there.
dwarf has names and other things that we don't need. We only
care about actual field layout of the structs.
But it probably won't be easy to parse llvm ir on perf side
instead of dwarf.

Shipping both llvm IR and .o to perf makes it harder to use. I'm
not sure whether it is a good idea. If we are unable to encode the
structure using a u64, let's still dig into dwarf.

We have another idea that we can utilize dwarf's existing feature.
For example, when __buildin_bpf_typeid() get called, define an enumerate
type in dwarf info, so you'll find:

  <1><2a>: Abbrev Number: 2 (DW_TAG_enumeration_type)
     <2b> DW_AT_name : (indirect string, offset: 0xec): TYPEINFO
     <2f> DW_AT_byte_size : 4
     <30> DW_AT_decl_file : 1
     <31> DW_AT_decl_line : 3
  <2><32>: Abbrev Number: 3 (DW_TAG_enumerator)
     <33> DW_AT_name : (indirect string, offset: 0xcc): __typeinfo_strA
     <37> DW_AT_const_value : 2
  <2><38>: Abbrev Number: 3 (DW_TAG_enumerator)
     <39> DW_AT_name : (indirect string, offset: 0xdc): __typeinfo_strB
     <3d> DW_AT_const_value : 3

or this:

  <3><54>: Abbrev Number: 4 (DW_TAG_variable)
     <55> DW_AT_const_value : 2
     <66> DW_AT_name : (indirect string, offset: 0x1e): __typeinfo_strA
     <6a> DW_AT_decl_file : 1
     <6b> DW_AT_decl_line : 29
     <6c> DW_AT_type : <0x72>

then from DW_AT_name and DW_AT_const_value we can do the mapping. Drawback is that
all __typeinfo_ prefixed names become reserved.

btw, if you haven't looked at iovisor/bcc, there we're solving
similar problem differently. There we use clang rewriter, so all
structs fields are visible at this level, then we use bpf backend
in JIT mode and push bpf instructions into the kernel on the fly
completely skipping ELF and .o
For example in:
https://github.com/iovisor/bcc/blob/master/examples/distributed_bridge/tunnel.c
when you see
struct ethernet_t {
   unsigned long long dst:48;
   unsigned long long src:48;
   unsigned int type:16;
} BPF_PACKET_HEADER;
struct ethernet_t *ethernet = cursor_advance(cursor, sizeof(*ethernet));
... ethernet->src ...
is recognized by clang rewriter and ->src is converted to a different
C code that is sent again into clang.
So there is no need to use dwarf or patch clang/llvm. clang rewriter
has all the info.

Could you please give us further information about your clang rewriter?
I guess you need a new .so when injecting those code into kernel?

I'm not sure you can live with clang/llvm on the host where you
want to run the tracing bits, but if you can that's an easier option.

I'm not sure. Our target platform should be embedded devices like smartphone.
Bringing full clang/llvm environment there is not acceptable.

Thank you.

Hi Wangnan, I've been authoring the BCC development, so I'll answer
those specific questions.

Could you please give us further information about your clang rewriter?
I guess you need a new .so when injecting those code into kernel?

The rewriter runs all of its passes in a single process, creating no
files on disk and having no external dependencies in terms of
toolchain.
1. Entry point: bpf_module_create() - C API call to create module, can
take filename or directly a c string with the full contents of the
program
2. Convert contents into a clang memory buffer
3. Set up a clang driver::CompilerInvocation in the style of the clang
interpreter example
4. Run a rewriter pass over the memory buffer file, annotating and/or
doing BPF specific magic on the input source
a. Open BPF maps with a call to bpf_create_map directly
b. Convert references to map operations with the specific FD of the new map
c. Convert arguments to bpf_probe_read calls as needed
d. Collect the externed function names to avoid section() hack in the language
5. Re-run the CompilerInvocation on the modified sources
6. JIT the llvm::Module to bpf arch
7. Load the resulting in-memory ".o" to bpf_prog_load, keeping the FD
alive in the compiler process
8. Attach the FD as necessary to perf events, socket, tc, etc.
9. goto 1

The above steps are captured in the BCC github repo in src/cc, with
the clang specific bits inside of the frontends/clang subdirectory.

I'm not sure. Our target platform should be embedded devices like
smartphone.
Bringing full clang/llvm environment there is not acceptable.

The artifact from the build process of BCC is a shared library, which
has the clang/llvm .a embedded within them. It is not yet a single
binary, but not unfeasible to make it so. The clang toolchain itself
does not need to exist on the target. I have not attempted to
cross-compile BCC to any architecture, currently x86_64 only.

If you have more BCC specific questions not involving clang/llvm,
perhaps you can ping Alexei/myself off of the llvm-dev list, in case
this discussion is not relevant to them.

Thank you for your reply.

Add He Kuang to CC list.

This is for BPF output. BPF program output bytes to perf through a
tracepoint. For decoding such data, we need a way to describe the format
of the buffer. This patch is a try which gives each variable a unique
number by introducing a new intrinsic 'llvm.typeid.for'.

At the bottom is an example of using that intrinsic and the result
of
$ clang -target bpf -O2 -c -S ./test_typeid.c

There is a limitation of the newly introduced intrinsic that, I can't
find a way to make the intrinsic to accept all types without name
mangling. Therefore, we have to define different intrinsics for
different type. See the example below, by using macro trick, we define
llvm.typeid.for.p0struct.mystr and llvm.typeid.for.p0struct.mystr2, and
also the different output functions.

Another problem is that I'm still unable to find a way to insert dwarf
information in this stage. After clang, debug information are already
isolated, and debug information entries are linked together. Adjusting
debug information requires me to create new metadata and new debug info
entries, link them properly then insert into correct place. Which is
possible, but makes code ugly.

Because of the above two problems, I decided to try clang builtin
again. I think that should be the last try. If still not work, then
I'd like to stop working on it until I have any better idea (BCC
rewriter should be a considerable solution). Let patch series
'Make eBPF programs output data to perf' be merged into upstream
without the 'typeid' change. Before the decoding problem solved, we
have to let user decode the BPF output themself manually or use
perf script or babeltrace script.

Thank you.

   ----------------- EXAMPLE -----------------
   extern void output(int id, void *ptr, int size);
   #define OUTPUT_STR(name) \
   struct name {

   #define OUTPUT_STR_END(name) \
   }; \
   unsigned long long __get_typeid_##name(struct name *) asm ("llvm.typeid.for.p0struct."#name); \
   static inline void output_##name(struct name *str) \
   {\
     output(__get_typeid_##name(str), str, sizeof(struct name));\
   };\
   static struct name __g_##name;

   OUTPUT_STR(mystr)
     int x;
     int y;
     int z;
   OUTPUT_STR_END(mystr);

   OUTPUT_STR(mystr2)
     int x;
     int y;
   OUTPUT_STR_END(mystr2);

   --------------- RESULT -------------
   int func(void)
   {
     int x = 123;
     struct mystr myvar;
     struct mystr2 myvar2;

     output_mystr(&myvar);
     output_mystr2(&myvar2);
     output_mystr(&myvar);
     return 0;
   }

   int func2(void)
   {
     int x = 123;
     struct mystr myvar;
     struct mystr2 myvar2;

     output_mystr2(&myvar2);
     output_mystr(&myvar);
     output_mystr2(&myvar2);
     return 0;
   }

     .text
     .globl func
     .align 8
   func: # @func
   # BB#0: # %entry
     mov r6, r10
     addi r6, -16
     mov r1, 1
     mov r2, r6
     mov r3, 12
     call output
     mov r2, r10
     addi r2, -24
     mov r1, 2
     mov r3, 8
     call output
     mov r1, 1
     mov r2, r6
     mov r3, 12
     call output
     mov r0, 0
     ret

     .globl func2
     .align 8
   func2: # @func2
   # BB#0: # %entry
     mov r6, r10
     addi r6, -24
     mov r1, 2
     mov r2, r6
     mov r3, 8
     call output
     mov r2, r10
     addi r2, -16
     mov r1, 1
     mov r3, 12
     call output
     mov r1, 2
     mov r2, r6
     mov r3, 8
     call output
     mov r0, 0
     ret

Signed-off-by: Wang Nan <wangnan0@huawei.com>

This is for BPF output. BPF program output bytes to perf through a
tracepoint. For decoding such data, we need a way to describe the format
of the buffer. This patch is a try which gives each variable a unique
number by introducing a new intrinsic 'llvm.typeid.for'.

At the bottom is an example of using that intrinsic and the result
of
$ clang -target bpf -O2 -c -S ./test_typeid.c

There is a limitation of the newly introduced intrinsic that, I can't
find a way to make the intrinsic to accept all types without name
mangling. Therefore, we have to define different intrinsics for
different type. See the example below, by using macro trick, we define
llvm.typeid.for.p0struct.mystr and llvm.typeid.for.p0struct.mystr2, and
also the different output functions.

Another problem is that I'm still unable to find a way to insert dwarf
information in this stage. After clang, debug information are already
isolated, and debug information entries are linked together. Adjusting
debug information requires me to create new metadata and new debug info
entries, link them properly then insert into correct place. Which is
possible, but makes code ugly.

Because of the above two problems, I decided to try clang builtin
again. I think that should be the last try. If still not work, then
I'd like to stop working on it until I have any better idea (BCC
rewriter should be a considerable solution). Let patch series
'Make eBPF programs output data to perf' be merged into upstream
without the 'typeid' change. Before the decoding problem solved, we
have to let user decode the BPF output themself manually or use
perf script or babeltrace script.

Thank you.

@@ -6769,6 +6773,29 @@ void SelectionDAGBuilder::visitPatchpoint(ImmutableCallSite CS,
   FuncInfo.MF->getFrameInfo()->setHasPatchPoint();
}

+void SelectionDAGBuilder::visitTypeidfor(const CallInst &CI) {
+ SDValue Res;
+ static std::vector<const StructType *> StructTypes;

'static' is obviously short term hack for illustration purpose, right?

+ int ID = -1;
+ Value *PtrArg = CI.getArgOperand(0);
+ PointerType *PTy = cast<PointerType>(PtrArg->getType());
+ if (PTy) {
+ StructType *STy = cast<StructType>(PTy->getElementType());
+ if (STy) {
+ for (unsigned i = 0, N = StructTypes.size(); i != N; ++i)
+ if (StructTypes[i] == STy)
+ ID = i + 1;
+ if (ID == -1) {
+ StructTypes.push_back(STy);
+ ID = StructTypes.size();
+ }
+ }
+ }
unsigned long long __get_typeid_##name(struct name *) asm ("llvm.typeid.for.p0struct."#name); \

the macro hack and the loop are quite ugly.
Also how do you plane to correlate such ID to dwarf info?
Instead of StructType we need to lookup DICompositeType,
but looks like there is no clear connection between call
arguments to metadata provided by clang.
May be indeed it would be easier to add clang intrinsic
that will add metadata number as explicit constant.

I didn't really have time to explore this problem in depth.
May be we can make the clear problem statement and someone
on llvm list that familiar with debug info can help design
a solution.
Let me state what I think we're trying to do.
For the program:
void foo(void * ptr);
void bar(...)
{
   struct S s;
   ...
   foo(&s);
}
We want to be able to scan .o file and for the callsite of
foo, we want to be able to find an id of DICompositeType
looking at binary code of .o, so we can lookup this id in
dwarf info (that is also part of .o) and figure out the layout
of the struct passed into the function foo.

This is for BPF output. BPF program output bytes to perf through a
tracepoint. For decoding such data, we need a way to describe the format
of the buffer. This patch is a try which gives each variable a unique
number by introducing a new intrinsic 'llvm.typeid.for'.

At the bottom is an example of using that intrinsic and the result
of
  $ clang -target bpf -O2 -c -S ./test_typeid.c

There is a limitation of the newly introduced intrinsic that, I can't
find a way to make the intrinsic to accept all types without name
mangling. Therefore, we have to define different intrinsics for
different type. See the example below, by using macro trick, we define
llvm.typeid.for.p0struct.mystr and llvm.typeid.for.p0struct.mystr2, and
also the different output functions.

Another problem is that I'm still unable to find a way to insert dwarf
information in this stage. After clang, debug information are already
isolated, and debug information entries are linked together. Adjusting
debug information requires me to create new metadata and new debug info
entries, link them properly then insert into correct place. Which is
possible, but makes code ugly.

Because of the above two problems, I decided to try clang builtin
again. I think that should be the last try. If still not work, then
I'd like to stop working on it until I have any better idea (BCC
rewriter should be a considerable solution). Let patch series
'Make eBPF programs output data to perf' be merged into upstream
without the 'typeid' change. Before the decoding problem solved, we
have to let user decode the BPF output themself manually or use
perf script or babeltrace script.

Thank you.
  @@ -6769,6 +6773,29 @@ void SelectionDAGBuilder::visitPatchpoint(ImmutableCallSite CS,
    FuncInfo.MF->getFrameInfo()->setHasPatchPoint();
  }
  +void SelectionDAGBuilder::visitTypeidfor(const CallInst &CI) {
+ SDValue Res;
+ static std::vector<const StructType *> StructTypes;

'static' is obviously short term hack for illustration purpose, right?

Of course. Actually I don't like this solution. Please see my commit message.

+ int ID = -1;
+ Value *PtrArg = CI.getArgOperand(0);
+ PointerType *PTy = cast<PointerType>(PtrArg->getType());
+ if (PTy) {
+ StructType *STy = cast<StructType>(PTy->getElementType());
+ if (STy) {
+ for (unsigned i = 0, N = StructTypes.size(); i != N; ++i)
+ if (StructTypes[i] == STy)
+ ID = i + 1;
+ if (ID == -1) {
+ StructTypes.push_back(STy);
+ ID = StructTypes.size();
+ }
+ }
+ }
unsigned long long __get_typeid_##name(struct name *) asm ("llvm.typeid.for.p0struct."#name); \

the macro hack and the loop are quite ugly.

Quite sure. This is a hard limitation if we implement this in llvm intrinsic.
Instead, in clang we can use varargs:

BUILTIN(__builtin_bpf_typeid, "Wi.", "nc")

Also how do you plane to correlate such ID to dwarf info?
Instead of StructType we need to lookup DICompositeType,
but looks like there is no clear connection between call
arguments to metadata provided by clang.

Not sure. I'd like try clang intrinsic again.

May be indeed it would be easier to add clang intrinsic
that will add metadata number as explicit constant.

I didn't really have time to explore this problem in depth.
May be we can make the clear problem statement and someone
on llvm list that familiar with debug info can help design
a solution.
Let me state what I think we're trying to do.
For the program:
void foo(void * ptr);
void bar(...)
{
    struct S s;
    ...
    foo(&s);
}
We want to be able to scan .o file and for the callsite of
foo, we want to be able to find an id of DICompositeType
looking at binary code of .o, so we can lookup this id in
dwarf info (that is also part of .o) and figure out the layout
of the struct passed into the function foo.

Yes.

I think if we can generate program like this we solve this problem:

struct structure1 {
   int ID;
   int x;
   int y;
};
struct structure2 {
   int ID;
   int a;
   int b;
};

enum bpf_types {
   BPF_TYPE_structure1 = 1,
   BPF_TYPE_structure2 = 2,
};

int func(void)
{
   struct structure1 var1;
   struct structure2 var2;

   var1.ID = BPF_TYPE_structure1;
   var2.ID = BPF_TYPE_structure2;
   foo(&var1);
   foo(&var2);
   return 0;
}

The key is the enum type. The value of BPF_TYPE_structure{1,2} will be recorded
in DWARF info like:

  <1><2a>: Abbrev Number: 2 (DW_TAG_enumeration_type)
     <2b> DW_AT_name : (indirect string, offset: 0xf4): bpf_types
     <2f> DW_AT_byte_size : 4
     <30> DW_AT_decl_file : 1
     <31> DW_AT_decl_line : 12
  <2><32>: Abbrev Number: 3 (DW_TAG_enumerator)
     <33> DW_AT_name : (indirect string, offset: 0xcc): BPF_TYPE_structure1
     <37> DW_AT_const_value : 1
  <2><38>: Abbrev Number: 3 (DW_TAG_enumerator)
     <39> DW_AT_name : (indirect string, offset: 0xe0): BPF_TYPE_structure2
     <3d> DW_AT_const_value : 2

So we can connect the ID field and type with them.

DW_AT_const_value can also be used by const, so we may be enum can be replaced.

Thank you.