Limitations in target register classification for vectors

LLVM v3.7's "TargetTransformInfo" provides an interface for querying the
target machine capabilities, and two of these methods are:

  unsigned getNumberOfRegisters(bool Vector)
  unsigned getRegisterBitWidth(bool Vector)

These only allow for a Boolean selection between registers that support
vectors and ones which do not, and assumes that all registers that support
vectors have the same length. Indeed, that all non-vector registers are the
same too.

However, for machines that have multiple width vectors registers, this is
not sufficient. So far I have mainly worked on implementing a new target
for LLVM, but I have not dug into the basic architecture.

What I would like is for a future version LLVM to handle a number of new
target capabilities:

. Targets that have more than one type of vector register - in my case I
have 32-bit and 128-bit vector registers
. Targets that can share vectors and scalars - in my case I can support
32-bit scalars and 32-bit vectors in the same registers

So for example, in response to:

  getNumberOfRegisters(false)

I can respond 32, but in response to:

  getNumberOfRegisters(true)

the answer is really 32 if I count only the 128-bit registers, or 64 if I
count all registers that support vectors, though 32 of these are the same 32
that support scalars. Similarly the answers to 'getRegisterBitWidth(x)'
have no absolute Boolean answer.

I had thought that an alternative approach might be to query based on the
MVT, but that does not allow for a target that might use wider vector
registers for small vectors if the register pressure is high on the smaller
register files. It also doesn't allow for overlapping sets of registers.

I'm not sure how to begin approaching this problem in the target independent
infrastructure of LLVM, and would welcome any suggestions about how solving
this in a general way might be approached.

Thanks,

  MartinO

From: "Martin J. O'Riordan via llvm-dev" <llvm-dev@lists.llvm.org>
To: "LLVM Developers" <llvm-dev@lists.llvm.org>
Sent: Thursday, September 24, 2015 7:45:31 AM
Subject: [llvm-dev] Limitations in target register classification for vectors

LLVM v3.7's "TargetTransformInfo" provides an interface for querying
the
target machine capabilities, and two of these methods are:

  unsigned getNumberOfRegisters(bool Vector)
  unsigned getRegisterBitWidth(bool Vector)

These only allow for a Boolean selection between registers that
support
vectors and ones which do not, and assumes that all registers that
support
vectors have the same length. Indeed, that all non-vector registers
are the
same too.

However, for machines that have multiple width vectors registers,
this is
not sufficient. So far I have mainly worked on implementing a new
target
for LLVM, but I have not dug into the basic architecture.

What I would like is for a future version LLVM to handle a number of
new
target capabilities:

. Targets that have more than one type of vector register - in my
case I
have 32-bit and 128-bit vector registers
. Targets that can share vectors and scalars - in my case I can
support
32-bit scalars and 32-bit vectors in the same registers

So for example, in response to:

  getNumberOfRegisters(false)

I can respond 32, but in response to:

  getNumberOfRegisters(true)

the answer is really 32 if I count only the 128-bit registers, or 64
if I
count all registers that support vectors, though 32 of these are the
same 32
that support scalars. Similarly the answers to
'getRegisterBitWidth(x)'
have no absolute Boolean answer.

First, let me say that we've developed this interface on an as-needed basis, and I'm welcome to reviewing enhancements to it. Dealing with architectures that have different numbers of vector registers for different types is clearly something we won't handle well now.

I had thought that an alternative approach might be to query based on
the
MVT, but that does not allow for a target that might use wider vector
registers for small vectors if the register pressure is high on the
smaller
register files.

But in this case, don't you just have, effectively, more vector registers for those types?

It also doesn't allow for overlapping sets of
registers.

I'm not sure how to begin approaching this problem in the target
independent
infrastructure of LLVM, and would welcome any suggestions about how
solving
this in a general way might be approached.

First, IR-level register-pressure modeling is a fairly crude heuristic, and so modeling a significant level of it likely unhelpful. Within that constraint, if you have information that is really helpful to model, please feel free to propose enhancements.

-Hal