We're looking at how to handle masked vector operations in architectures
like Knight's Corner. In our case, we have to translate from a fully
vectorized IR that has mask support to llvm IR which does not have mask
For non-trapping instructions this is fairly straightforward:
t1 = add t2, t3, mask
; llvm IR -- assuming we want zeros in the false positions, which is not
; always the case
tt = add t2, t3
t1 = select mask, tt, 0
It's easy enough to write a TableGen pattern to match add+select and
emit a masked instruction. Alternative, we can always resort to manual
For trapping operations this is problematic. Take a load. Here's the
same attempt with a load:
tt = load [addr]
t1 = select mask, tt, 0
The problem is that the load is unconditional. Originally it was masked
presumably because the original scalar load was under a condition,
probably protecting against traps. However, since llvm has no concept
of a masked load, to stay within the IR we need to use an unconditional
load. We can get the same result vector after the select, but that's
too late. The load has already executed and the llvm passes will assume
that load cannot fault (otherwise it's undefined behavior). The llvm IR
does not convey the same semantics as the fully predicated IR.
It seems the only solution is to create an intrinsic:
llvm_int_load_masked mask, [addr]
But this unnecessarily shuts down optimization.
Similar problems exist with any trapping instruction (div, mod, etc.).
It gets even worse when you consider than any floating point operation
can trap on a signalling NaN input.
The gpuocelot project is essentially trying to do the same thing but I
haven't dived deep enough into their notes and implementation to see how
they handle this issue. Perhaps because current GPUs don't trap it's a
non-issue. But that will likely change in the future.
So are there any ideas out there for how to efficiently handle this?
We've talked about llvm and masks before and it's clear that there is
strong resistance to adding masks to the IR. Perhaps an alternative is
to mark an instruction as "may trap" so that llvm will not perform
certain transformations on it. Of course that involves teaching all of
the passes about a new "may trap" attribute, or whatever mechanism is
I would very much appreciate thoughts and ideas on this. As it is, it
doesn't seem like it's possible to generate efficient llvm IR for fully
predicated instruction sets.