Custom Instruction Cost Model to LLVM RISC-V Backend

Hi,

I’m working on a RISC-V architecture that has instruction costs different from those in the default cost model. Is there an out-of-source way to provide llc with custom cost model? Or, does this need a change in LLVM backend?

Also, the cost model is not totally static. For example, loads from 0x1000-0x1ffc take 1 cycle, whereas loads from address > 0x80000000, take 10-100 cycles. Is it possible to model these costs in the scheduler? Requiring programmer to define custom pointer types to mark pointers, to one or the other address space, is okay. Is there a way to model different costs for different pointer types?

Thank you,
Bandhav

Hi Bandhav,

While I’m unfamiliar with the details of codegen and cost modelling in LLVM, it is possible to declare multiple address spaces for a target, mark pointers as belonging to a specific address space, and cast between them.
To quote the language reference “For targets that support them, address spaces may affect how optimizations are performed and/or what target instructions are used to access the variable.”, which sounds quite hopeful for your usecase. If nothing else, you could probably model the loads from different address spaces as being different instructions, even if they are in the end the same instruction in the output machine code, and tag them with different costs.

Hope this helps,
Henrik

Thanks a lot Henrik!

I figured following would mark a pointer to a specific address space:

#define __myaddrspace attribute((address_space(1)))

__myaddrspace int* data;

And, I was able to verify loads being annotated to be from addrspace 1 in the generated IR. Would this work for automatic variables as well?

In regards to using this in the backend, do I have to just modify the source, or is it possible to load a library at runtime, like we load a frontend pass?

I’m unclear on whether it works with automatic variables. Testing it in clang gives an error message “automatic variable qualified with address space”.
However here is an LLVM discussion on the semantics of alloca with different address spaces: https://lists.llvm.org/pipermail/llvm-dev/2015-August/089706.html
Some people seem to have gotten multiple separate stacks working, according to my skim read.

So it might potentially be technically supported in LLVM, but not in clang.

/Henrik

The alloca address space changes the address space of the stack, but
there’s still only one stack. So Clang supports generating code with a
non-zero alloca address space, but it doesn’t support changing the address
space of individual local variables.

Bandhav, you may find it interesting to look into some of the work done
for GPU targets. I think AMDGPU has the ability to compile arbitrary
C/C++ code for their GPU. Ordinary C/C++ code is unaware of address
spaces, but the hardware has the traditional GPU memory model of different
private/global/constant address spaces, plus a generic address space
which encompasses all of them (but is much more expensive to access).
By default, you treat an arbitrary C pointer as a pointer in the generic
address space, but LLVM and Clang know that local and global variables are
in various other address spaces. That creates a mismatch, which Clang
handles by implicitly promoting pointers to the generic address space
when you take the address of a local/global. In the optimizer, you can
recognize accesses to promoted pointers and rewrite them to be accesses
in the original address space. This is then relatively easy to combine
with address-space attributes so that you can explicitly record that a
particular pointer is known to be in a particular address space.

Of course, for the promotion part of that to be useful to you, you need
specific kinds of memory (e.g. the stack) to fall into meaningful address
ranges for your cost model, or else you’ll be stuck treating almost every
access conservatively. You could still use address spaces to know that
specific accesses are faster, but not being able to make default
assumptions about anything will be really limiting.

That’s also all designed for an implementation where pointers in
different address spaces are actually representationally different. It
might be overkill just for better cost-modeling of a single address space
with non-uniform access costs. In principle, you could get a lot of work
done just by doing a quick analysis to see if an access is known to be
to the stack.

John.

Thanks for the great response John! As I worked on it more, I realized, multiple stacks is in fact an overkill and just differentiating pointers is good enough.

The solution I ended up implementing, to dynamically change the load-latency based on address spaces, is to use TargetSubtargetInfo’s adjustSchedDependency() method:
https://llvm.org/doxygen/classllvm_1_1TargetSubtargetInfo.html#a80e16c673bf028bf985c58b50a8a70c5

In adjustSchedDependency, I updated the dependency edge’s (SDep) latency as:

void RISCVSubtarget::adjustSchedDependency (SUnit *Def, SUnit *Use,
SDep &Dep) const {
MachineInstr *SrcInst = Def->getInstr();
if (!Def->isInstr())
return;

if (getCPU() == “mycpu”) {
ArrayRef<MachineMemOperand*> memops = SrcInst->memoperands();
if (SrcInst->mayLoad() &&
!memops.empty() && memops[0]->getAddrSpace() == 1) {
Dep.setLatency(20);
}
}
}

I’m wondering, is this the intended use of adjustSchedDependency? Is there a more recommended way of modeling load-latency based on address space?