From: "David Majnemer" <email@example.com>
To: "Hal Finkel" <firstname.lastname@example.org>
Cc: "Pete Cooper" <email@example.com>, "LLVMdev" <firstname.lastname@example.org>
Sent: Saturday, July 4, 2015 5:21:23 PM
Subject: Re: [LLVMdev] extractelement causes memory access violation - what to do?
> From: "David Majnemer" < email@example.com >
> To: "Hal Finkel" < firstname.lastname@example.org >
> Cc: "Pete Cooper" < email@example.com >, "LLVMdev" <
> firstname.lastname@example.org >
> Sent: Wednesday, July 1, 2015 7:17:19 PM
> Subject: Re: [LLVMdev] extractelement causes memory access
> violation - what to do?
> > From: "Pete Cooper" < email@example.com >
> > To: "Hal Finkel" < firstname.lastname@example.org >
> > Cc: "LLVMdev" < email@example.com >, "Paweł Bylica" <
> > firstname.lastname@example.org >
> > Sent: Wednesday, July 1, 2015 6:42:41 PM
> > Subject: Re: [LLVMdev] extractelement causes memory access
> > violation - what to do?
> > >
> > >> From: "Pete Cooper" < email@example.com >
> > >> To: "Paweł Bylica" < firstname.lastname@example.org >
> > >> Cc: "Hal Finkel" < email@example.com >, "LLVMdev"
> > >> < firstname.lastname@example.org >
> > >> Sent: Wednesday, July 1, 2015 12:08:37 PM
> > >> Subject: Re: [LLVMdev] extractelement causes memory access
> > >> violation - what to do?
> > >>
> > >> Sorry for chiming in so late in this.
> > >>
> > >> So I agree that negative indices are UB, I don’t think thats
> > >> contentious.
> > >>
> > >> However, I think the issue here is the DAG expansion. That is
> > >> the
> > >> point at which we go from UB which would just be hidden in the
> > >> instruction to something which can crash. I think its at that
> > >> point
> > >> where we should mask to ensure that the in memory expansion
> > >> doesn’t
> > >> read out of bounds. On architectures which do the variable
> > >> extract
> > >> in an instruction, they won’t be penalized by a mask,
> > >
> > > Why do you feel that they won't be penalized by the mask? Or
> > > are
> > > you assuming will adjust the patterns to match the index plus
> > > the
> > > mask?
> > Ah, should have explained better. What I meant was that if i can
> > do
> > the variable extract in a register without going to memory at all
> > (so have suitable legal instructions to do so), then we won’t
> > generate a mask at all. The mask would only be generated when the
> > legalizer moves the data to memory which we don’t do if its
> > legal.
> Ah, alright.
> > >
> > >> only the
> > >> memory expansion will be which should be rarer,
> > >
> > > On some architectures expansion in memory is not particularly
> > > expensive because they have very good store-to-load forwarding.
> > > Adding additional masking instructions into the critical path
> > > of
> > > correct code will not be a welcome change.
> > Thats true, so i guess it depends how many architectures need to
> > do
> > variable extracts in memory. I have no idea if any architectures
> > we
> > support are able to do a variable extract in a register, or if
> > all
> > use memory.
> At least on PowerPC, when using QPX, we can do this using
> > If most use a register, then penalizing the few who do
> > use memory by inserting a mask seems reasonable.
> > >
> > >>
> > >> The point about speculation at the IR level is interesting.
> > >> Personally i’m ok with constant indices being speculated and
> > >> variable not. If we later want to find a good way to ask TTI
> > >> whether
> > >> a variable extract is cheap then we can find a way to do so.
> > >
> > > It is not about expense, it is about not introducing UB when
> > > speculating the instruction.
> > Yeah, I see what you mean here. So the user could have written
> > if (i >= 0) x = extract v[i]
> > but if we speculate then we aren’t guarded and have UB. Having
> > the
> > backend insert the mask would fix this, but I agree that someone,
> > somewhere needs to put in the mask if we want to allow
> > speculation
> > here, and the target can’t do the variable extract in a register.
> I'd rather the frontend do this if the language wants it. We can
> ComputeKnownBits when we do the speculation check, and so if there
> is a mask, we'll be fine.
> I don't think we can rely on ComputeKnownBits for this sort of
> %and = and i64 %x, 1
> %idx = lshr exact i64 %and, 1
> %z = extractelement <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, i64
> ComputeKnownBits doesn't take 'exact' into account and will report
> that %idx must be all zeros. However, %idx might turn into undef if
> %x has the bottom bit set. In fact, it doesn't appear to be
> conservative at all in the face of flags. If anything, it takes
> advantage of them:
> This tells me that we have three options if we want the instruction
> to stay speculatable:
> 1. Make a variant of ComputeKnownBits (or something along those
> lines) which is explicitly pessimistic in the face of flags.
> 2. Kick the can to the backend.
> 3. Make ComputeKnowBits pessimistic in the face of flags.
If we do the speculation correctly, then this is not a problem.
Because the 'exact' flag might have control dependencies we cannot
speculate the lshr without dropping the flag. Only at that point can
we check (or confirm) that we can still speculate the
I don't think that simplifycg, loop unrolling or any of the other
speculation transforms actually drop flags.
If so, this is, unfortunately, a bug.
The reality is, of course, that most of these flags come from type properties of a high-level language and are not actually subject to control dependencies. As a result, it is often hard to notice these problems when they occur. However, once you combine casting with GVN/CSE, etc. it is possible to expose problems.
FWIW, I'd really like to have a different scheme for these kinds of properties that would allow us not to have to make the most conservative possible assumptions about control dependencies. I'm not entirely sure what such a scheme should look like, however.
Wouldn't such a regime
make it impossible to speculate call instructions (with 'pure'
targets) which don't have side effects (because we couldn't
necessarily strip their flags)?
We generically assume that the 'pureness' of a function is a property of the function, and not subject to control dependencies at the call site. OTOH, this is generally because we have the attribute on the function declaration. If we had only the attribute at the call site, and not on the declaration, then we'd need to strip the flags there too.