RISC-V LLVM sync-up call 19 Mar 2020

For background on these calls, see
<http://lists.llvm.org/pipermail/llvm-dev/2019-September/135087.html>.

Reminder: the purpose is to co-ordinate between active contributors.
If you have support questions etc then it's best to post to llvm-dev.

We have a call each Thursday at 4pm GMT, via
<https://meet.google.com/ske-zcog-spp>.

I've created a shared calendar which may help in keeping track, which
is accessible at:
  * <https://calendar.google.com/calendar/b/1?cid=bG93cmlzYy5vcmdfMG41cGtlc2ZqY25wMGJoNWhwczFwMGJkODBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ>
  * <https://calendar.google.com/calendar/ical/lowrisc.org_0n5pkesfjcnp0bh5hps1p0bd80%40group.calendar.google.com/public/basic.ics>

Issues to discuss today include the following:
* Improving rust code size by not forcing frame pointers
<https://github.com/rust-lang/rust/pull/69890>
* Compact code model (Evandro)
* Update on embedded PIC discussions
* Small data limit <https://reviews.llvm.org/D57497>
* Bitmanip / experimental extension status
* ELF attribute support close to merging
<https://reviews.llvm.org/D75833> <https://reviews.llvm.org/D74023>
* No other topics were submitted, as always, please do submit things
you'd like to discuss

Best,

Alex

Here’s the draft proposal for the compact code model on RV. I’d appreciate your feedback before I propose it to the foundation and go about updating the psABI.

Thank you,

Compact Code Model.pdf (154 KB)

If I’m following correctly, there are two size-limited areas. One area, limited to 2GB, is the “text” area. This contains all the code. Then there’s a “global” area, limited to 4GB, which is pointed to by the global pointer. This contains the GOT, plus a flexible area that the object file can stick small bits of data into. And then outside of both of those, additional data is unlimited.

It took me multiple times reading through the proposal to parse that out; it might be a good idea to reorganize the proposal so that’s explained somewhere explicitly.

My big question here is, how much benefit are you really getting from having a global pointer? If you eliminate it and combine the two size-limited areas into one, you end up with essentially the small-PIC code model. The small-PIC code model supports everything your proposed “compact” model does, with a couple minor differences:

Hi Evandro,

Thank you for writing this proposal.

I presume you have seen the FDPIC proposal on the RISC-V sw-dev mailing list [1]. I am keen that any gp-relative relocations that could be shared between the two proposals are shared and compatible with each other - I think this will make a major difference to maintainability and our ability to document these details.

Am I right in thinking your proposal only has one value for `gp` for the whole program, and thus it does not need to be saved or loaded in function prologs and epilogs?

Sam

[1]: https://groups.google.com/a/groups.riscv.org/forum/#!msg/sw-dev/ZjYUJswknQ4/WYRRylTwAAAJ

Hi, Sam.

Thank you for writing this proposal.

I presume you have seen the FDPIC proposal on the RISC-V sw-dev mailing list [1]. I am keen that any gp-relative relocations that could be shared between the two proposals are shared and compatible with each other - I think this will make a major difference to maintainability and our ability to document these details.

Yes, I have and, at the time it came out, I was designing this code model. I’m glad that someone else also thought of the GP relative relocations. Code reuse is always desirable.

Am I right in thinking your proposal only has one value for gp for the whole program, and thus it does not need to be saved or loaded in function prologs and epilogs?

As is current the case, it is callee saved. Of course, since DSOs have their own small data area, if a function in a DSO built for the compact code model refers to any global data, then it needs to save, restore and setup the GP within it.

Thank you,

Hi, Eli.

If I’m following correctly, there are two size-limited areas. One area, limited to 2GB, is the “text” area. This contains all the code. Then there’s a “global” area, limited to 4GB, which is pointed to by the global pointer. This contains the GOT, plus a flexible area that the object file can stick small bits of data into. And then outside of both of those, additional data is unlimited.

Actually, the global data area, which includes the GOT and any global of local scope, is limited to 4GiB. However, the latter is limited to 2GiB, in order to guarantee addressing the GOT and the small data area, assuming this section order:
.got
.sdata .sbss
.ldata .lbss

The .data and .bss sections, containing the globals of global scope, may follow these immediately, but can actually be placed anywhere in the memory map.

It took me multiple times reading through the proposal to parse that out; it might be a good idea to reorganize the proposal so that’s explained somewhere explicitly.

Will do.

My big question here is, how much benefit are you really getting from having a global pointer? If you eliminate it and combine the two size-limited areas into one, you end up with essentially the small-PIC code model. The small-PIC code model supports everything your proposed “compact” model does, with a couple minor differences:

The size-limited areas are limited to 2GB combined, instead of 6GB combined.
The relaxations are a little different. Small-PIC always takes two instructions to access a GOT entry; the compact model can do it in one instruction for the first 500 (?) entries in the GOT. Not sure what would end up with smaller codesize in practice.

This code model addresses the cases when code and data reside in different memory devices and at distant addresses. Sometimes, RAM is faster than ROM and it's not desirable to have even read only data in ROM. Besides the obvious case of addressing, say, a peripheral buffer in a distant memory location. But the scheme above would not allow code and data to reside in different memory devices, if the user so prefers.

On the other hand, just like we have the code models `medlow` and `medany`, we could have two variations of the compact code model too. One assuming that code and small data and local data are on the same memory device (`cmplow`) and another assuming no such restriction (`cmpany`).

Thank you,

Oh, I wasn’t really thinking about devices without an MMU where the addresses are physically separated. Makes sense.

This reminds me of rwpi on ARM; it has a sort of similar scheme of referring to data indirectly through a pointer, but it also changes the ABI to keep the pointer in a reserved register.

-Eli

Eli,

Yep, we’re looking at a ROPI/RWPI model for RISC-V and it is shaking out to be fairly similar to this model (though we’ve only been looking at it for 32-bit RISC-V).

I suppose how I’m thinking about the difference between a ROPI/RWPI model and this compact model is in the former, you cannot know until load-time what the offset is between the code section and the data section is, whereas in the latter you’re using `gp` to keep the upper 32-bits of that offset to avoid materialising it in every access (if you statically linked, and we added enough relocations to materialise 64-bit immediates, you could materialise the offset everywhere. It would be inefficient, but not wrong).

Is this a fair understanding, Evandro?

Sam

Hi, Sam.

I think that it’s a fair comparison.

Keep in mind that the GP is only used to reach global variables of local scope and the GOT, where the address of global variables of global scope reside.

This model assumes that the distance between the GP and the global data area, GOT and local scope variables is defined at link time.

Folks,

I converted the PDF for the compact code model proposal to an MD file and put it at https://github.com/ebahapo/riscv-elf-psabi-doc for easier reading and commenting.

Thank you,

The issue request against the psABI is at https://github.com/riscv/riscv-elf-psabi-doc/issues/140. I’d appreciate if you would provide any feedback you have there, wether it’s a simple LGTM or it looks horrible to me.

Thank you,