[RFC/PSA] Changing the shadow call stack register on RISC-V

There’s an active discussion on D146463 about changing the register used for the shadow call stack (the patch description does a good job of summarising things so I won’t repeat here).

Our working assumption is that the shadow call stack register has little/no real-world use and so such a change wouldn’t be disruptive, but please speak up if it would be problematic for you.

Another proposal from me is using gp as platform register: [RFC] Relax gp could be platform specific register rather than reserved for… by kito-cheng · Pull Request #371 · riscv-non-isa/riscv-elf-psabi-doc · GitHub

Some advantage on taking gp as platform register rather than other GPRs:

  • Compiler doesn’t use gp register anywhere for now.
  • All assembly files (which conform with current ABI) didn’t use that except the __global_pointer$ initialization code in CRT files.
  • The main user is linker, linker will use that to perform linker relaxation, and we already have the command option to tune that off.

Potential issues:

  • Loss the code size and performance gain from gp relaxation
    • The most gain from gp relaxation is embedded application, it’s different target audience as SCS, so this should not blocker issues.
    • Android is an example, it’s already disable GP relaxation at all, so we don’t have any loss for this case.
  • Will it break any existing platform?
    • Treat gp as platform register is optional, it’s still default use as gp relaxation, so NO breakage on existing platform, but give the freedom of the platform to use gp register as other purpose if they don’t want gp relxation.
    • Added an attribute to let linker to help mixing up different gp usage objects, also linker could check that attribute to make sure gp relaxation is do-able or not.
1 Like

We do not want to slow this process down, because it is important that we come to a resolution as soon as possible, but we do want to raise these concerns now, while there is still time to take them into account.

Ideally, projects that place a premium on code size wouldn’t have to forgo code size savings from global relaxation if they want to use SCS, or make use of the platform register. We see no fundamental reason why users in the embedded space who care deeply about code size would not fall into this category, and it would be nice if they could also make use of the feature without being required to drop global relaxation.

As mentioned in the sig-toolchain meeting and the ps ABI issues, there are many benefits to using gp, but before we make a choice we should be sure that we are explicit about the tradeoffs are why we believe they are worth making.

Just cross post my reply here so that anyone who not following the RISC-V psABI repo can know the follow up discussion:


Thanks for the comment, SCS is kind of special is that eventually will break the ABI (for RISC-V) since that require one extra reserved register, so that give we few more freedom to having more option than other ABI issues.

Of cause the why it become an issue that must did a ABI breakage change is we didn’t specify a platform register at beginning, but anyway the boat sailed.

So that’s back to the SCS first, I think we have three options:

  • Pick a GPR as SCS
  • Re-define gp to allow that use other than gp relaxation.
  • Use Zisslpcfi extension, that provide dedicated instruction and CSRs to implement SCS.

Pick a GPR as SCS

There are actually two candidate during the discussion x18 and x27, but x18 is kind of many potential issues, so now we are discussion the other candidate, and then x27 has purposed.

x27 obviously better than x18 just like @appujee has listed on the first post.

but it’s still has low probability might screwed up in some asm code since it was not reserved before, so I am not prefer to pick up a non-reserved register if possible.

Re-define gp to allow that use other than gp relaxation.

Already listed several reason in #371, so not duplicate here, so just jump to the concern @ilovepi raised, what if user want SCS and gp relaxation, I think it’s the most potential drawback of this proposal.

GP Relax No GP relax
SCS Enable NOT OK OK
SCS disable OK OK

What’s the possible solution if people want more code size saving AND SCS?

  • Use Zisslpcfi, that would be most simple but require HW support.
  • GlobalMerge optimization pass from LLVM*, it could archive similar optimization like gp relaxation, but for local, that should be able to improve by LTO.
  • Proposal around the EABI/deviations: using tp as gp (this can be only apply on those system not require thread)

NOTE: * GCC has similar stuffs but called section anchor optimization.

Use Zisslpcfi extension, that provide dedicated instruction and CSRs to implement SCS.

Okay, that should be everyone happy, no extra reserved register needed since the extension has provide dedicated one, the only issue is that require HW has implement that.

Compare

Just drop Zisslpcfi from the comparison table since it’s not the point in this thread.

Option 1 Option 2
Used in existing asm code? Yes *1 No
Impact code size Yes, but very minor Yes *2
Break ABI Hard break Soft break compare to option 1

So based on the comparison table and several reason listed in #371, I believe we should go with option 2

*1: google/android-riscv64#78 Android have an issue to tracked where has use the SCS register candidate.
*2: We have some compiler optimization to make up the gap like GlobalMerge optimization, and be noted there is no loss on those system already disable GP relaxation.