When -fstack-protector is turned on, linker fails to find the symbol “__stack_chk_guard” because at least for powerpc64le, glibc doesn’t provide this symbol. Instead, they put the stack guard into TCB.
x86 fixed this issue by injecting a special address space (which is later translated to TCB register access) and hard code the offset of stack_guard, but I don’t see a easy way to handle address spaces in ppc.
A cleaner solution could be adding an IR intrinsic llvm.get_tcb_address() and hard code the offset of stack_guard member, since they aren’t supposed to change.
Details are in the bug: https://llvm.org/bugs/show_bug.cgi?id=26226
Any ideas?
Thanks!
When -fstack-protector is turned on, linker fails to find the symbol "__stack_chk_guard"
because at least for powerpc64le, glibc doesn't provide this symbol.
Instead, they put the stack guard into TCB.
x86 fixed this issue by injecting a special address space (which is later
translated to TCB register access) and hard code the offset of stack_guard,
but I don't see a easy way to handle address spaces in ppc.
A cleaner solution could be adding an IR intrinsic llvm.get_tcb_address()
and hard code the offset of stack_guard member, since they aren't supposed
to change.
Can you try to implement the intrinsic for x86 as a proof of concept?
When -fstack-protector is turned on, linker fails to find the symbol “__stack_chk_guard” because at least for powerpc64le, glibc doesn’t provide this symbol. Instead, they put the stack guard into TCB.
x86 fixed this issue by injecting a special address space (which is later translated to TCB register access) and hard code the offset of stack_guard, but I don’t see a easy way to handle address spaces in ppc.
A cleaner solution could be adding an IR intrinsic llvm.get_tcb_address() and hard code the offset of stack_guard member, since they aren’t supposed to change.
Details are in the bug: https://llvm.org/bugs/show_bug.cgi?id=26226
Any ideas?
Not a huge fan of a ppc specific intrinsic (which it should be, so llvm.ppc… if we go that route) to do this. I actually rather liked the cleanliness of the address space solution for x86. How much work would it be to do that? Alternately: Hal, Kit, what do you two think as far as the ppc backend?
The other solution you mentioned - combining the slot load into the existing intrinsic might work, we’d just need to figure out how to autoupgrade everything into it which might be a bit more difficult than fixing the backends and dealing. Have you looked into how the autoupgrade would work?
Thanks!
-eric
From: "Eric Christopher" <echristo@gmail.com>
To: "Tim Shen" <timshen@google.com>, llvm-dev@lists.llvm.org, "Hal
Finkel" <hfinkel@anl.gov>, "Kit Barton" <kbarton@ca.ibm.com>
Sent: Wednesday, February 10, 2016 6:59:50 PM
Subject: Re: [llvm-dev] [PPC] Linker fails on -fstack-protector
> When -fstack-protector is turned on, linker fails to find the
> symbol
> " __stack_chk_guard" because at least for powerpc64le, glibc
> doesn't
> provide this symbol. Instead, they put the stack guard into TCB.
> x86 fixed this issue by injecting a special address space (which is
> later translated to TCB register access) and hard code the offset
> of
> stack_guard, but I don't see a easy way to handle address spaces in
> ppc.
Why is handling address spaces in ppc any more difficult than doing so for x86?
-Hal
From: “Eric Christopher” <echristo@gmail.com>
To: “Tim Shen” <timshen@google.com>, llvm-dev@lists.llvm.org, “Hal Finkel” <hfinkel@anl.gov>, “Kit Barton” <kbarton@ca.ibm.com>
Sent: Wednesday, February 10, 2016 6:59:50 PM
Subject: Re: [llvm-dev] [PPC] Linker fails on -fstack-protector
When -fstack-protector is turned on, linker fails to find the symbol “__stack_chk_guard” because at least for powerpc64le, glibc doesn’t provide this symbol. Instead, they put the stack guard into TCB.
x86 fixed this issue by injecting a special address space (which is later translated to TCB register access) and hard code the offset of stack_guard, but I don’t see a easy way to handle address spaces in ppc.
Why is handling address spaces in ppc any more difficult than doing so for x86?
Shouldn’t be at all, mostly just seems that a bunch of it hasn’t been set up yet.
-eric
I’ll come up with a address-space-based proof of concept.
I found a bit weird to use address space for this, since the offset of getting stack_guard in TCB is, unfortunately, negative:
https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/linux64.h#L610
In my understanding an address space is referring to a segment register (-on powerpc 32bit; or SLB entry on powerpc 64bit?) with a non-negative offset value, so that it’s actually accessing data in the specified segment.
In our case, I feel accessing r13 (TCB pointer) is more different than I thought. I’m considering turning to add a target independent IR intrinsic “llvm.get_tcb_address()”. It’s target independent because glibc does this for several platforms, and we probably want to solve it once for all:
~/src/glibc % grep -r ‘stack_guard;’ .
./sysdeps/mach/hurd/i386/tls.h: uintptr_t stack_guard;
./sysdeps/i386/nptl/tls.h: uintptr_t stack_guard;
./sysdeps/sparc/nptl/tls.h: uintptr_t stack_guard;
./sysdeps/s390/nptl/tls.h: uintptr_t stack_guard;
./sysdeps/powerpc/nptl/tls.h: uintptr_t stack_guard;
./sysdeps/x86_64/nptl/tls.h: uintptr_t stack_guard;
./sysdeps/tile/nptl/tls.h: uintptr_t stack_guard;
Opinions?
It would also be inefficient on architectures that can directly access
TLS variables. I.e. on x86, it is effectively a statically allocated TLS
variable with fixed offset. That can be accessed by a single load --
whereas introducing get_tcb_address first would require a second load.
Joerg
Guess I used the wrong intrinsic name - it should be llvm.global_tls_address(), and it should be directly lowered to ISD::GlobalTLSAddress, which is currently used by both x86 and ppc, and ultimately it’s referencing fs register on x86_64 and r13 on ppc64le.
I found a bit weird to use address space for this, since the offset of getting stack_guard in TCB is, unfortunately, negative:
https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/linux64.h#L610
In my understanding an address space is referring to a segment register (-on powerpc 32bit; or SLB entry on powerpc 64bit?) with a non-negative offset value, so that it’s actually accessing data in the specified segment.
In our case, I feel accessing r13 (TCB pointer) is more different than I thought. I’m considering turning to add a target independent IR intrinsic “llvm.get_tcb_address()”. It’s target independent because glibc does this for several platforms, and we probably want to solve it once for all:
~/src/glibc % grep -r ‘stack_guard;’ .
./sysdeps/mach/hurd/i386/tls.h: uintptr_t stack_guard;
./sysdeps/i386/nptl/tls.h: uintptr_t stack_guard;
./sysdeps/sparc/nptl/tls.h: uintptr_t stack_guard;
./sysdeps/s390/nptl/tls.h: uintptr_t stack_guard;
./sysdeps/powerpc/nptl/tls.h: uintptr_t stack_guard;
./sysdeps/x86_64/nptl/tls.h: uintptr_t stack_guard;
./sysdeps/tile/nptl/tls.h: uintptr_t stack_guard;
Yeah, for most of the architectures listed there it’s not particularly useful as they support direct access to TLS variables (as Joerg says later). That grep isn’t representative of how the data is actually accessed. If the current address space way of specifying isn’t doable on PPC then that’s fine (or feels icky). That said, the basic idea of keeping around the way to access the TLS variable is pretty easy, we do some amount of this in TargetLowering::getStackCookieLocation, but we might just need to come up with a new interface there that returns the address of the stack local load rather than an offset from an address space.
-eric
Yeah, for most of the architectures listed there it’s not particularly useful as they support direct access to TLS variables (as Joerg says later). That grep isn’t representative of how the data is actually accessed. If the current address space way of specifying isn’t doable on PPC then that’s fine (or feels icky).
It is certainly doable, but I feel icky:
- It needs to support negative offset, which is weird when we are talking about address spaces;
- Handling TLS is totally different from handling real ppc segments (which I assume address space is designed for);
- TLS has a different semantic from address space (http://llvm.org/docs/CodeGenerator.html#x86-address-spaces-supported).
That said, the basic idea of keeping around the way to access the TLS variable is pretty easy, we do some amount of this in TargetLowering::getStackCookieLocation, but we might just need to come up with a new interface there that returns the address of the stack local load rather than an offset from an address space.
I also found a similar case - getSafeStackPointerLocation(). On X86 it’s implemented in terms of address space (similar to getStackCooikeLocation), but on AArch64 it’s implemented in terms of a target specific AArch64ISD::THREAD_POINTER and Intrinsic::aarch64_thread_pointer.
To make the fix least surprising, I can either do:
- Create PPCISD::THREAD_POINTER and Intrinsic::ppc_thread_pointer and do similar things aarch64 does; or
- Don’t create PPCISD::THREAD_POINTER, but directly calls llvm.read_register intrinsic in ppc’s getStackCookieLocation(). This is the way that requires least change; or
- Create a generic ISD::GET_GLOBAL_TLS_ADDRESS and intrinsic llvm.get_global_tls_address(), and lower them to target specific ISD. No target specific intrinsic is needed. I was wrong about ISD::GlobalTlsAddress, since it requires a GlobalValue object.
I prefer 3), since it’s less hacky than 2) and less repetitive than 1).
Hi Tim,
I’m a little confused about what you’re trying to accomplish here.
Are you trying to find a way to access the stack_guard in the TCB provided by glibc?
If not, can we not just come up with our own definition of __stack_chk_guard and discuss the best way to implement that (perhaps not by putting it in the TCB)?
I’m not familiar with the GCC implementation, but I can talk to the GCC folks tomorrow to get some more details. I know that XL implemented this differently, but some specific details are a bit fuzzy. I’ll refresh my memory on that tomorrow as well.
Thanks,
Kit Barton, Ph.D.
LLVM Development on POWER
IBM Toronto Lab, D2/929/8200/MKM
8200 Warden Ave, Markham, L6G 1C7
(905) 413-3452
kbarton@ca.ibm.com
Hi Tim,
I’m a little confused about what you’re trying to accomplish here.
Are you trying to find a way to access the stack_guard in the TCB provided by glibc?
Yes.
If not, can we not just come up with our own definition of __stack_chk_guard and discuss the best way to implement that (perhaps not by putting it in the TCB)?
I’m not familiar with the GCC implementation, but I can talk to the GCC folks tomorrow to get some more details. I know that XL implemented this differently, but some specific details are a bit fuzzy. I’ll refresh my memory on that tomorrow as well.
Based on my understanding, GCC simply hardcoded the register number and offset: https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/linux64.h#L610, and I intend to do similar things here, except we may not need/want to hardcode register number.
It’ll be great if you and GCC folks can help. Thank you!
(4) Create an intrinisinc for the SSP cookie itself and allow targets to use
different lowering passed on factors like the OS.
Joerg
You can get the TCB address from %fs:0 or %gs:0 on x86. But that's
suboptimal for the purpose here as you need to do a second load with the
resulting pointer, when you could have gotten it from %fs:$magic_offset
or %gs:$magic_offset in first place.
Joerg
Isn’t LLVM already combining a register load with an offset (0) followed by another offset-load? If it’s currently not implemented, it seems quite useful and generic, and we should just implmenet it?
If for some reason it can’t be done, we can still pass the offset into the intrinsic, as ISD::GlobalTLSAddress is designed.
ISD::GlobalTLSAddress is actually quite close to what we want - except for a GlobalValue it requires. Maybe we can make that requirement optional? E.g. if a nullptr is passed in as the GlobalValue, generate fs:offset, rather than fs:@a+ offset.
> You can get the TCB address from %fs:0 or %gs:0 on x86. But that's
> suboptimal for the purpose here as you need to do a second load with the
> resulting pointer, when you could have gotten it from %fs:$magic_offset
> or %gs:$magic_offset in first place.
>
Isn't LLVM already combining a register load with an offset (0) followed by
another offset-load? If it's currently not implemented, it seems quite
useful and generic, and we should just implmenet it?
I don't think so -- wouldn't normally make sense and the TLS access path
is explicitly lowered differently.
If for some reason it can't be done, we can still pass the offset into the
intrinsic, as ISD::GlobalTLSAddress is designed.
ISD::GlobalTLSAddress is actually quite close to what we want - except for
a GlobalValue it requires. Maybe we can make that requirement optional?
E.g. if a nullptr is passed in as the GlobalValue, generate fs:offset,
rather than fs:@a+ offset.
That doesn't solve the other problem I mentioned that a number of
different ways to speed up the access exists. I.e. OpenBSD duplicates
the cookie in every DSO in a hidden variable, which is comperative in
overhead on most architectures and sometimes even much cheaper than
using TLS-alike.
Joerg
Yeah I think this is better. Thanks!
So suppose we have a intrinsic ssp_cookie, when SelectionDAG visits it, it calls target specific function getSspCookie(…):
virtual SDNode getSspCookie(…) const;
Compared to returning a Value* to a function pass (lib/CodeGen/StackProtector.cpp), returning a SDNode requires less wrapping on backend (e.g. Intrinsic::aarch64_thread_pointer).