I don't see how the situation you mention is comparable. Legalization
for e.g. <3 x i32> was not implemented at first, but as demonstrated
by the fact that it *was* implemented later, there's no conceptual
problem with legalizing that kind of type. You don't even have to
legalize them in vector registers, three scalar registers work fine
(you can even do that on the IR level).
That was the point I was trying to make, but in my head that fused
with register shadowing, which derailed the point.
To be clear, yes, "invalid" register configurations can easily usually
be legalised in multiple ways at lowering.
Not all will be optimal, though, and there is where the problem lives.
Legalization (codegen in general) does not know if the machine
code will eventually run on a chip with vector registers so small that
vscale works out to 1/2, but it has to choose some legalization
This is interesting, I had not realised that from the descriptions of
the problem so far. I thought it was just due to non-power-of-two
A "vector" register that is smaller than 64 bits wouldn't make much
sense, unless this is a DSP-type extension on very small types. In
those cases, every clock cycle and every instruction counts,
especially inside the inner loop.
I'm struggling to see how this can be optimally executed from a
generic scalable code, which usually profits from the fact that vscale
If <vscale x 1 x i32> ends up having one element, and <vscale x 2 x
i32> also has one (= 2 * 0.5) element, then that's wrong: the latter
type must have twice as many elements as the former (one example where
this matters: split_low / split_high / concat shuffle patterns). The
second option, a vector with *zero* elements, is just as wrong if not
Right, that was the idea behind vscale from the beginning. I don't
know how many elements either has, but I know the latter has twice as
many as the former.
I see why you would want half-length, because that truth still holds:
the latter has twice as many halves as the former.
But how do you handle the last half? Do you ignore? Do you load /
store half? Do you always mask it out? Do you fuse with the next
iterations' first half?
If the semantics is not clear on how the back-ends are supposed to use
that extra half, then extending the IR in such a way can make it very
hard for generic optimisations to understand anything about the
ranges, validity of operations, alignment, masks, undefined behaviour,
It's not that a correct legalization exists but it's too annoying to
implement, or that one might exist but I'm too lazy to work it out.
I never meant to imply that. Apologies if that's what came through.
We're also not running in a limitation or oddity of the RISC-V vector
ISA in particular. It's simply that, if you set vscale == 0.5, then by
the way scalable vector types work (vscale * const elements), some
vector types that can be written in the IR would need to have a
fractional number of elements to be consistent with the other scalable
vector types. As that is not possible (not even conceptually),
whatever code you emit to try to legalize that type will end up being
wrong in some respect.
Honestly, I'm running out of breath in this discussion.
I don't know a lot about SVE and even less about RISC-V, so I'll leave
the more in-depth technical discussions for Florian/Sander and others
to chime in.
So if we'd decide to support fractional vscale, we can't say these
types are "illegal". In LLVM parlance, illegal types can be used in
LLVM IR and targets aspire to turn them into something that works
correctly, even if it's very inefficient. Sometimes a legalization is
unimplemented or buggy, but these problems can be patched and this has
often happened in the past. With fractional vscale, the situation is
quite different: nobody will ever be able to use certain scalable
vector types on the target in question, because they can't be
legalized even in principle.
I have not spent the time you guys have on this, but if I understood
your problem correctly, I too can't think of a way to represent this
in non-fractional ways.
I'm not saying this is a good idea, and I think you're not saying it
is either, but perhaps the only idea.
If that's the case, then I have proposed to use a different
flag/integer to mean half-scale instead of floating points, and
hopefully that can be transparent to the rest of scalable vector code.
But I'd really like to get other people's point of view, as I'm not
confident on my appraisal.
I hope this lengthy explanation help you see where I'm coming from.
It did, thanks!