CUDA: instrumenting PTX code

Hi,

I am not sure if there is any CUDA/PTX instrumenting feature in LLVM.

I want to generated a simple memory trace and I know GPGPU Ocelot does that. But I was thinking why not LLVM.

So I am looking at two optimizations implemented in LLVM for CUDA for some inspiration.

  1. Address inference: Does this use PTX IR or LLVM IR? I would say LLVM IR based on some code keywords like PHI nodes etc.

  2. Bypass slow div: This is a generic optimization done adopted for CUDA. I think it uses LLVM IR.

So my question is, to instrument PTX code, shall I focus on LLVM IR or PTX?

Some definite guidance on these lines will be very helpful. Thank you.

Sincerely,
Gurunath

Follow-up: Or should it be SASS code that should be instrumented?!!

See: https://github.com/NVlabs/SASSI

Hi,

I am not sure if there is any CUDA/PTX instrumenting feature in LLVM.

I want to generated a simple memory trace and I know GPGPU Ocelot does
that. But I was thinking why not LLVM.

So I am looking at two optimizations implemented in LLVM for CUDA for some
inspiration.

1. Address inference: Does this use PTX IR or LLVM IR? I would say LLVM IR
based on some code keywords like PHI nodes etc.

2. Bypass slow div: This is a generic optimization done adopted for CUDA.
I think it uses LLVM IR.

Both optimizations are IR-level.

So my question is, to instrument PTX code, shall I focus on LLVM IR or PTX?

Depending on what you want to trace. For memory tracing, instrumenting IR
is probably enough, because there's an almost one-to-one mapping between a
load/store in optimized IR and a load/store in PTX.