Auto-generating MachineValueTypes

Hi everybody,

I can't be the first to consider this so I'm wondering if I've missed
something obvious. Has anyone discussed or attempted auto-generating
(parts of) the MachineValueType.h header? There are a couple of issues
with it that could certainly be improved:

1. The enum-value data is multiply-defined in
include/llvm/Support/MachineValueType.h and
2. The explicit numbering of enum values in each file makes diffs
verbose due to cascading, and errors are easy to miss
3. Inconsistencies between these enum values are not guaranteed to show
errors at compile time and (in my experience anyway) some bugs may only
be exposed by one or two lit tests

The factors above are such that we've accepted patches with new types
from out-of-tree targets just to make their downstream lives easier. I
myself have been on the downstream side of things before and it is a
nuisance. With RISC-V, and perhaps with other variable-length vector
targets, I could foresee downstream forks adding their own custom wide
vector types. It'd be nice to make things simpler for that workflow

Since we already have the MVT data in TableGen, I was wondering if it's
possible to use those to auto-generate the enum values and even some of
the trivial helper functions like (getVectorElementType, getSizeInBits,
etc) with a new TableGen backend.

My high-level goal would be to introduce the ability to add a new
regular type with one line of code.

Since the enum order is important to parts of the code generator, I was
envisaging the type data being ordered into a list and having the enum
values come out of that. The pseudo markers like
FIRST_INTEGER_VALUETYPE could no doubt be inferred but could also be
explicitly placed. The type data itself could build up types from
scalar types to vector ones so all of the size/vector element/vector
length is known and can be used to auto-generate methods:

def i32 : IntegerValueType<32>;
def v2i32 : VectorValueType<i32, 2>;

For markers, I would imagine that it would be possible to track when
we've transitioned from a scalar integer type to a scalar floating
point type and add a marker. If we find another scalar integer type
later in the list we can error to prevent odd ordering issues.

I realise this is quite similar to the type hierarchy in
include/llvm/IR/ so I don't know if this idea could be
used or shared with the Intrinsics' types to make them simpler.

Alternatively, since I've no doubt missed some complexity (or
triviality), does anyone else have any ideas that could improve this
part of the code generator?


1 Like


I also don't know if there are some hidden problems with generating (parts of) MachineValueType.h but living with a downstream clone with 12 added MVTs (some really early in the enum) compared to trunk, it would simplify dealing with trunk changes to the types a lot!

I don't know how many times I've manually had to deal with MachineValueType.h and due to reformatting or new types and it's a pain every time.

It would be awesome if this could be simplified!

(adding llvm-dev to my previous post)

Hi Björn, thanks for the food for thought. Indeed, as far as stumbling
blocks go that certainly is a biggie.

I think a two-step process would be necessary, unless there's a neat
way of having both C++ and TableGen read the same data in their
respective languages. The TableGen preprocessor is (thankfully) quite
limited and reading TableGen as C++ may take some doing.

I may be pessimistic but I can't see how we'd build tblgen twice
with/without MVTs that wouldn't introduce complexity for everyone else.
I wonder how controversial it would be to use a simple (python?) script
as an early step to generate the C++ and TableGen data from the same
internal model.