[llvm-xray] unable to account multithreaded programs

Hope everyone is well!

I am looking to obtain time spent in each function via xray, but I’ve been unable to figure out how to get llvm-xray account to work with a multithreaded application. Does account require specific flags or considerations to work with multithreaded programs?

I’m aware that, as per an old mailing list email https://groups.google.com/g/llvm-dev/c/Ysval_fSQRU, that llvm-xray has issues with shared libraries: Could this have potential ramifications with e.g. libpthread.so?

Detailed issue report

I’m getting an error as such:

Error processing record: {type: 0; cpu: 9; record-type: enter; function-id: 2; tsc: 8360522561311619; thread-id: 2796286; process-id: 2796286}}
Thread ID: 2796288
  (empty stack)
Thread ID: 2796295
  (empty stack)
...
Thread ID: 2796293
  (empty stack)

I am compiling with -fxray-instrument -fxray-instruction-threshold=1, and running my instrumented binary with XRAY_OPTIONS="patch_premain=true verbosity=2 xray_mode=xray-basic" (I see no errors from the verbose output). My program looks like the following:

  void multithread_task(int thread_id)
  {
      std::this_thread::sleep_for(std::chrono::milliseconds(100));
      std::cout << "Thread id: " << thread_id << "\n";
  }
 
  int main()
  {
      std::vector<std::thread> threads;
      for (int i = 0; i < NUM_THREADS; i++)
          threads.emplace_back(multithread_task, i);
 
      for (std::thread& th : threads)
          th.join();
 
      return 0;

Thanks in advance for your time! My last-ditch attempt would be to attempt computing time elapsed from the tsc values in the xray records, although I’ve yet been able to produce accurate numbers: I honestly have no idea how xray is able to calculate time elapsed while simultaneously removing it’s own time-elapsed from the time reported in llvm-xray account :sweat_smile:

Hi!

llvm-xray has issues with shared libraries: Could this have potential ramifications with e.g. libpthread.so?

I do not think so. It is true that XRay previously did not support the instrumentation and recording of calls to shared libraries. However, this should not interfere with recording threaded user code.
As a side note: in the current development version of LLVM, XRay now supports shared library instrumentation for X86 and AArch64.

I honestly have no idea how xray is able to calculate time elapsed while simultaneously removing it’s own time-elapsed from the time reported in llvm-xray account

With any profiling/tracing tool, there is always some level of runtime overhead introduced by internal event handling. This can never be perfectly accounted for.
If the perturbation is small enough, however, it should not affect the qualitative performance results too much.
XRay tries to limit this by excluding small functions below the instruction threshold.
There is also some research on how to compensate for this overhead in the analysis but I don’t think XRay uses such techniques.

Regarding your error, I’m not sure what this could be caused by but I’m happy to take a closer look in the next couple of days.
Could you tell me which LLVM version you are working with?

Hey thanks for the explanations! I am on the latest HEAD for llvm (018b32ca1fd0214e4a359ed8388a2c859d0fc841); I suppose shared libraries are probably not the culprit then.

Do you have any ideas as to what the cause may be? I am unfortunately unfamiliar but I can also take a look if I have time.

Hi, I finally got around to test this out for myself. There seems to be a bug in the account tool that causes it to fail in certain cases if multithreading is used. As of now, I haven’t figured out the root cause for this.

You can work around it by first converting to YAML records and then running account:

  1. Convert: llvm-xray convert <trace_file> --output-format=yaml --output converted-trace.yaml
  2. Manually change the version field to 1 in converted-trace.yaml
  3. Account: llvm-xray account converted-trace.yaml --instr_map=<executable>

Hope this helps.

Sorry for the late response! I haven’t actually had a chance to test this yet: I am no longer able to load/convert the trace file at all, but I have been messing with my build trying to get DSO support working, so I don’t think it counts

Speaking of DSO support, I hadn’t realized that you were actually the person to implement the feature as a part of your thesis on CaPI: can I ask more about DSO support instead? :sweat_smile:

  • With -fxray-shared, is it a matter of enabling -fxray-shared for both the DSO and the binary?

  • Did I need to make any other changes, or do anything else differently? for --instr_map, would I be using the binary, or would I be using the DSO itself to get DSO functions

  • Is libclang_rt.xray-dso.a supposed to be a drop-in replacement for libclang_rt.xray.a, or are they supposed to run in tandem?

    I noticed that by default, things were only linking to libclang_rt.xray-dso.a without libclang_rt.xray.a, but as libclang_rt.xray-dso.a is currently linked without the compiler-rt sanitizer stuff, my compiles are failing. However, if I manually link libclang_rt.xray.a as well, I get errors regarding multiple definitions. I tried editing the cmake to include xray sources in xray-dso as well, but in addition to function symbolization breaking, I’m still not getting any functions from the DSO in my results

Thanks a ton btw for the DSO feature! It is much appreciated!

1 Like

Hey, thanks for the kind words and your interest in my work on XRay!

To answer your questions:

With -fxray-shared, is it a matter of enabling -fxray-shared for both the DSO and the binary?

-fxray-shared needs to be passed when creating the DSO (during linking). It is always supported on the executable side. It does not hurt to have it always enabled though.

Did I need to make any other changes, or do anything else differently? for --instr_map, would I be using the binary, or would I be using the DSO itself to get DSO functions

The addition of instrumented DSOs unfortunately does not work with the current model of statically extracting sled information during analysis. This is because this information is split across the involved binaries (i.e. executable and DSOs) and we cannot statically infer which libraries may have been loaded during execution (think LD_PRELOAD and dlopen).

As a result, when running with one of the builtin XRay logging modes, calls recorded from DSOs will not be correctly resolved during analysis and will revert to the function IDs.

I am currently working on a patch that collects sled information at runtime and embeds them into the trace file. If accepted, this will solve this issue at the expense of slightly larger, but portable, trace files.

Is libclang_rt.xray-dso.a supposed to be a drop-in replacement for libclang_rt.xray.a, or are they supposed to run in tandem?

They work in tandem.libclang_rt.xray.a controls the overall patching and event handling and libclang_rt.xray-dso.a contains the necessary functionality to enable XRay on the DSO side.

I noticed that by default, things were only linking to libclang_rt.xray-dso.a without libclang_rt.xray.a, but as libclang_rt.xray-dso.a is currently linked without the compiler-rt sanitizer stuff, my compiles are failing. However, if I manually link libclang_rt.xray.a as well, I get errors regarding multiple definitions. I tried editing the cmake to include xray sources in xray-dso as well, but in addition to function symbolization breaking, I’m still not getting any functions from the DSO in my results

If you compile all of your code with -fxray-instrument -fxray-shared, linking in the correct XRay libraries should be handled automatically. See the test compiler-rt/test/xray/TestCases/Posix/patching-unpatching-dso.cpp for an example usage.

If it still doesn’t work, please post the complete compiler invocations and I’ll have a look.

1 Like