[XRay] RFC: Custom Event Logging

Hi llvm-dev,

I've recently been working on a set of patches that extends XRay to support user-provided, custom event handling. These are as follows:

- llvm: an @llvm.xray.customevent(...) intrinsic [Login]
- clang: a __xray_customevent(...) builtin [Login]
- compiler-rt: a means of installing a handler for these custom events [Login]

At a high level, this intends to provide users a way of passing application- or library-specific data to the XRay runtime handlers. The default implementation of a handler would be something that just puts the data as raw bytes into the XRay log, to be handled by the tools in post-processing and analysis. In particular, we turn user code that looks like:

  // foo.cc
  [[clang:xray_always_instrument]] void foo() {
    static const auto event_id = 42;
    __xray_customevent(&event_id, sizeof(event_id));
  }

Into LLVM IR that roughly resembles:

  define void _zfoov() #0 {
    // some set-up...
    call void @llvm.xray.customevent(<pointer>, <size>)
  }

The @llvm.xray.customevent(...) intrinsic then gets lowered to something like:

  <align to two byte boundary>
  .xray_sled_label:
    jmp +N
    // calling convention set-up
    pushq %rax // 1 byte
    mov %rax, __xray_CustomEvent // 7 bytes
    callq *%rax // 2 bytes
    popq %rax // 1 byte

At runtime, we can turn the `jmp+12` instruction into a two-byte nop which enables the instrumentation. We also implement the trampoline (__xray_CustomEvent) in assembler as part of the xray runtime library.

Open questions:

- This is really specific to XRay, but there's functionality here that may be translatable in other situations. For instance, it's a really flexible feature to allow for emitting code that's guarded against with an unconditional jump. While the XRay sled is of a known length statically, we may want to be able to do this for arbitrary lengths of code. The smallest intrinsic here would be the patchable jump, which could be an intrinsic in itself. Would that be preferable to this approach? Or could they co-exist with the XRay lowering dependent on this smaller intrinsic?

- We are forcing the calling convention set-up in x86_64 to the SysV64 calling convention when we lower (so we have instructions that move arguments from the default C calling convention to registers in the SysV64 calling convention) because it's unclear whether we can just say `call <sysv64> void @llvm.xray.customevent(...)`in the LLVM IR, and have the call lowering do the right thing. Its not clear to me whether it's possible to force the calling convention in the intrinsic function definition. Any tips/pointers on this?

Cheers

-- Dean