llvm.read_register for %RIP on x86_64

Hi,

I want implement an instrumentation that gets the current PC.
On x86_64 I can do it using inline asm (something like “lea (%%rip),%0”),
but I wonder if there is some more LLVM-ish way to do it, e.g. an intrinsic?

I can only find r208104 which introduces llvm.read_register:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140505/215840.html

The LangRef says:

Warning: So far it only works with the stack pointer on selected architectures (ARM, AArch64,
PowerPC and x86_64). Significant amount of work is needed to support other registers and even
more so, allocatable registers.

Is it reasonable to extend llvm.read_register to handle the program counter register (on x86_64)?
Or InlineAsm is a better way?

Thanks,

–kcc

I would do inline asm for now.

If you wanted to avoid the asm, I recommend adding a new intrinsic to get the “current” PC. The intrinsic would probably have to be modeled as reading and writing memory to establish control dependence and avoid being CSE’d to the entry block.

Hi Kostya,

I'd also want something that GCC understands, as this code could end
up there, too.

The read_register intrinsic can be lowered by Clang from a number of
different builtins, so we could easily "support" some already-existing
GCC builtin for reading the PC, if you need to get it from C code.

Right now, the read_register is locked at the stack pointer as a
design decision. We do not know, nor we discussed the implications of
that intrinsic for any other register on purpose. If you want to read
the PC via a builtin, then we'll have to have that conversation one
way or another.

I strongly recommend you to use read_register, since support is
already there (you only need to add "PC" to the list and everything
works), and it's documented and the semantics are clear.

A way to convince people that reading the PC in certain cases is not
just ok, but meaningful, is to create a piece of inline asm and show
your case. It will certainly help the discussion to understand the
constraints and limit support for the cases we know are safe.

These are the original threads:

http://lists.llvm.org/pipermail/llvm-dev/2014-March/071472.html

http://lists.llvm.org/pipermail/llvm-dev/2014-March/071530.html

cheers,
--renato

> I want implement an instrumentation that gets the current PC.
> On x86_64 I can do it using inline asm (something like "lea (%%rip),%0"),
> but I wonder if there is some more LLVM-ish way to do it, e.g. an
intrinsic?

Hi Kostya,

I'd also want something that GCC understands, as this code could end
up there, too.

The read_register intrinsic can be lowered by Clang from a number of
different builtins, so we could easily "support" some already-existing
GCC builtin for reading the PC, if you need to get it from C code.

Right now, the read_register is locked at the stack pointer as a
design decision. We do not know, nor we discussed the implications of
that intrinsic for any other register on purpose. If you want to read
the PC via a builtin, then we'll have to have that conversation one
way or another.

I strongly recommend you to use read_register, since support is
already there (you only need to add "PC" to the list and everything
works), and it's documented and the semantics are clear.

hmm. I am not sure I understood you. The last two paragraphs seem to
contradict each other.
So, you recommend to extend read_register to read the PC, or
"read_register is locked at the stack pointer as a design decision"?

Both. :slight_smile:

There was a design decision to only support SP because we had no clear case for anything other than the stack pointer.

If you have one for the PC, it would be a much better technical decision to reuse the machinery that already exists, is documented and tested, than come up with an independent implementation.

The discussion whether to support it in clang or not is orthogonal. But once decided that we should support reading the PC, then read_register is the obvious place.

If the clang developers refuse the idea, then inline assembly is the only option that would work across compilers.

Makes sense?

Cheers,
Renato

hmm. I am not sure I understood you. The last two paragraphs seem to
contradict each other.
So, you recommend to extend read_register to read the PC, or
"read_register is locked at the stack pointer as a design decision"?

Both. :slight_smile:

There was a design decision to only support SP because we had no clear
case for anything other than the stack pointer.

If you have one for the PC, it would be a much better technical decision
to reuse the machinery that already exists, is documented and tested, than
come up with an independent implementation.

The discussion whether to support it in clang or not is orthogonal. But
once decided that we should support reading the PC, then read_register is
the obvious place.

If the clang developers refuse the idea, then inline assembly is the only
option that would work across compilers.

Makes sense?

Yep, will give it a try.

FTR:
I’ve made a simple experiment with inline asm, it’s as simple as

InlineAsm::get(FunctionType::get(IntptrTy, false),
StringRef(“lea (%rip),$0”), StringRef(“=r”),
/hasSideEffects=/false);

It generates a pretty efficient-looking code:

42194c: 48 8d 0d 00 00 00 00 lea 0x0(%rip),%rcx # 421953 <_Z3fooPi+0x13>
421953: 48 89 0c c5 60 3a 08 mov %rcx,0x1083a60(,%rax,8)

(first instruction gets the PC, the other stores it somewhere)

However, this introduces a significant slowdown in my case, over 20%
(I need to execute this roughly for 50% of basic blocks)

So, I’ll be looking for some other solutions that don’t require reading the PC this often. :frowning:

–kcc

Efficient looking on a 386, sure, but I presume you’re running AMD64 code on a modern CPU :slight_smile:

I suggest you try:

call foobar
foobar:
pop %rcx
mov %rcx,0x1083a60(,%rax,8)