I am fairly new to LLVM and still learning. I have the following basic question about what should be the pointer specification for my target in the data layout string.
My target has the following properties:
All CPU registers are 32bit
All Data Memory reads and writes are 32bit wide and are 32bit aligned
The address bus between the CPU and Memory is 16bit
So my question is, what should be my data layout string for the pointer type?
Should it be: p:16:32?
Or should it be: p:32:32?
Or should it be: something else?
more likely to be p:16:32. The size in p:<size>:<abi> means the size of the pointer, since address bus is 16-bit wide the pointer size is probably 16 bits as well. The abi portion is the ABI-mandate alignment, which you’ve already stated to be 32 bits.
I think you can ignore the fact that the address bus is 16-bit and declare pointers 32-bit. It will save you a lot of trouble, including the one you’re observing. There are no backends with pointers less than a register size and this configuration is quite untested.
What is more concerning is (2). Are you saying you don’t have byte access to memory?
Thank you very much for your input. Yes, the target does NOT have byte access.
All Loads read 32bit data from 32bit aligned Data Mem and place the read data into 32bit CPU Regs.
All Stores read 32bit data from 32bit CPU Regs and write to a 32bit location in Data Mem that is 32bit aligned.
NOTE1:
Please note that Data Mem and Program Mem are separate.
NOTE2:
In order to Load a byte, Ill have to Load 32bit and then mask off unwanted bytes and then shift data to bring to the correct (LSB) part of the Reg. In order to Store a byte, Ill have to first Load 32bit data (from the location where I need to Store at) into a Reg and then insert the byte (that I need to eventually Store) into that Reg at the correct location of the Reg (via masking and shifting and ORing etc) and then do a 32bit Store of the Reg to the desired location. So that would be pretty expensive.
Can you please elaborate on why this is concerning? What kind of issues would this limitation cause?
Basically, it is not supported. You can find many topics on discourse referring to the issue, with the most recent one (I think) being RFC: On non 8-bit bytes and the target for it (follow the links there for more info).
I’m currently working on a draft that adds the support to LLVM, but it won’t touch front end / back end, only IR and middle end passes. It would be very helpful if you give me more details about your target so that I can try to make the support more generic.
What does it mean from code generation point of view? Do code pointers have different properties compared to data pointers?
Where do you keep the index (0-3) of the accessed byte? Do you shift a pointer left by two bits (effectively making the pointer 18-bits)?
If performance is not a concern, you might want to check out this RFC.
If I understand correctly, you are saying that LLVM always assumes that addresses are Byte addresses and it does not support Word addresses. Is my understanding correct?
If yes, LLVM only supports Byte addresses, then what if I process the generated assembly (via some script) to divide any addresses by 4. Would that work? Using LLVM I only intend on generating Assembly as my output.
My intention is to use the target for mostly arithmetic processing (Floating point processing).
Program Memory and Data Memory being separate is Harvard Architecture. All pointers are for data. So I dont think that would be a concern.
Different parts of the compiler make different assumptions. If we’re talking about code generation, then yes, it assumes that the value of a pointer stored in a register is a byte address.
If yes, LLVM only supports Byte addresses, then what if I process the generated assembly (via some script) to divide any addresses by 4. Would that work?
No. If you see an instruction like “add reg0, 16” you don’t know if this is a pointer computation or just an addition of two integers. Moreover, not all immediates will be divisible by 4.
Also, the generated assembly will be just wrong in the first place, because of the assumtion that registers hold byte addresses. In particular this will be a problem when working with stack (passing arguments to function, register spilling, etc.).
I assume it is some kind of DSP, and you have optimized libraries written in assembly? If so, then you should also consider ABI compatibility between the compiler generated assembly and those libraries.
There are pointers to functions, at least. They may be the same as pointers to data, or may be different. This is why I’m asking.
Excellent point! And thanks for the explanation. Let me summarize target features here:
The following is additional info:
The Loads/Stores Instructions take Mem Address operand to be Byte Address, however the address that goes on the actual memory address bus is the corresponding Word Address (i.e. the two LSBs from the Byte Address in the Load/Store Mem operand Register are dropped by the CPU and the remaining bits of the Mem operand Register are sent to the Address Bus by the CPU)
So with this in mind, would there be any issue with LLVM? Yes one obvious limitation of above is that we can only load/store 32bit data from/to the data memory. But if the use cases only require 32bit data load/stores (e.g. placing any 16bit values to be placed in 32bit locations with sign extension etc) then LLVM should be able to handle this correctly. No?
Thank you for your valuable feedback. Your feedback is really helpful
I see. By looking at the code of SelectionDAGLegalize::modifySignAsInt() it seems like it has to deal with floating point in some way. Since I am still a newbie so I couldnt really understand the code to full extent . So it would be great if you could please answer the following questions that I think might be relevant to this aspect:
Under what conditions would the SelectionDAGLegalize::modifySignAsInt() get called? Any way to avoid such byte accesses? More importantly what C/C++ source code could generate such byte accesses? Asking so maybe those C/C++ use cases can be avoided? (I plan that the target support C/C++ only). Or if it cant be avoided then is there a way to convert byte accesses to 32bit accesses (i.e. to 32bit load/store)?
Please note that the 32bit CPU Registers are meant to be used for INTs as well as Floats operations (i.e. its a common Register file for Floats and INTs, there is no separate Register file for Floats). Could the CPU register file being common to Floats and Ints be a problem? Please note that the target can directly use its FPU (Floating Point Unit) for Float operations (I.e. Floating point operations will not be emulated, instead Target can directly do Float operations). Please also note that Doubles etc are not to be used. Only INTs and Floats are to be used.
One more thing that I am supporting that the Target can directly do is “Float to Int” conversion and “Int to Float” conversion directly by FPU. Could that cause any issues with LLVM?
Finally: Since I am a newbie and I am still doing the initial bring up of backend so right now I am only using opt level of O0 (i.e. no optimization), but in future (after initial bring up of backend is done) I plan on using higher opt levels. I read somewhere that higher optimization flags can result in more rich (i.e. more complex) LLVM IR getting generated and hence Instruction Selection can be more difficult. So can you please comment if the issue related to SelectionDAGLegalize::modifySignAsInt() (i.e. the byte accesses) is encountered at higher opt levels? I plan on using the smallest possible Opt level (that would do basic optimizations like remove any unnecessary Load/Stores etc) in order to avoid complicated instruction selection since Im still a newbie
Pointers being narrower than registers is actually something that several targets already know how to handle. Fortunately, it sounds like this isn’t something you really need to deal with, but if you did, the x32 work for x86_64 and the arm64_32 work for AArch64 would probably be places to look.
I agree with Sergei that a lack of byte-addressing is likely to be a more important stumbling block. You might be able to muddle through, though.
Avoid some C/C++ features in order to not generate byte accesses?
OR
Is there a way to make the byte load/stores into 32bit load/stores?
OR
Another solution could be that I modify the Target Hardware itself to have at least some part of address space be able to do byte load/stores (having full address space support byte accesses would be too expensive from Target point of view). E.g. make 0x000 to 0x1000 address range to be byte accessable, and higher addresses be only word addressable. Would such a solution be compatible with LLVM? The idea would be that any byte accesses required by LLVM be mapped onto the 0x000 to 0x1000 address range.
If you have the flexibility to change the hardware, can you just support byte accesses in the ISA? You could still do word-sized accesses over the bus if you want; just make byte stores load the containing word, overwrite the desired byte, and store the word back. That would be problematic if you had multiple devices talking to memory and needed atomic access to a single byte, but that being impossible is presumably a constraint you can live with since you’re in fact already living with it.
I guess you can find it out by yourself by examining the callers (and callers’ callers).
You would have to rewrite a decent part of the instruction selection.
You seem to be already doing it by doing read-modify-write operations with shifts and masks. I don’t know of other ways.
There shouldn’t be. This setup is very common.
It shouldn’t.
I think you won’t encounter this issue since i32 is legal on your target. This was just an example. Another example would be reducing load width: if you do a 32-bit load, for example, and then masking off higher 24 bits, the optimizer will most likely replace this load with an 8-bit load. These not are the only cases when instruction selection can generate byte accesses, I don’t remember them all.
This might be a reasonable approach, but higher optimization levels don’t really add much context from instruction selection point of view.
Thank you for excellent and very valuable feedback. I have couple of final questions please:
Actually I have not implemented it yet. Can you plz point me to any backends (that maybe slightly similar to my backend) that do the read-modify-write Stores? I can then follow the example and implement mine.
Can you please elaborate on this a little bit as in which aspect of GlobalISel makes it a better option? Is it that GlobalISel can support Targets which do not support byte addressed memory like:
and
?
OR
Is it that GlobalISel wont generate any byte Load/Stores, like mentioned below:
?
Please note that although I said the following about my target:
But as of today my target hardware does not do that. Today the Load/Store Instructions take Mem Address operand to be Word Address, and the address that goes on the actual memory address bus is that very same Word Address without the 2 LSBs dropping by CPU to access a memory location that is 32bit wide and 32bit aligned. So I will have to update my target hardware (which is very undesirable option) to support the following:
So if GlobalISel option can prevent me from doing this hardware change then that would be a very good option for me.
Again thank you very much for such excellent feedback. I really appreciate it.
There is none in-tree, and I don’t know of any out-of-tree.
Both SelectionDAG and GlobalISel rely on 8-bit bytes and can generate byte accesses. The reason why GISel might be preferable is that it contains much less code and therefore has fewer places where something can go wrong. It is more difficult to port GISel to a new target though.
If you don’t have these 2 LSBs, then you can’t do RMW byte stores because you don’t know which part of the word needs to be modified. The other consequence is that you will have to make bytes 32-bit, and this is something the front end (or any other part of the toolchain) can’t currently handle.
If you have the 2 LSBs, you can go with RMW stores. This is a better option in many other ways except that the generated code will be slower when working with chars and other types that are less than 32 bits. I’ve never tried or needed to go this way, so there might be some issues that I don’t see, but they are nothing compared to the amount of work that needs to be done to support the previous option.
With any of the above options 64K RAM will probably be too small for any application using the standard C library.
The best option is to implement byte accesses in hardware assuming you have control over it. You will get the rest for free.