Limit loop vectorizer to SSE

The AVX + JIT bug is hitting more frequently now. On a AVX machine the
loop vectorizer goes for a vector length of 8 for some of my functions
which in turn causes a SEGFAULT.

Is there a way to limit the loop vectorizer to a certain vector length,
say 4, such that I can work around the bug?

Frank

I am asking because the option 'force-vector-width' is too restrictive.
I would like to leave open the possibility to use vector width 2.

Frank

I was about to say that, and you saved us both one cycle. :wink:

What you could do is to force an architecture that doesn't have AVX, only
SSE. I'm not sure how to do that on the JIT, I suppose setting the Target
attributes would be enough. Nor I know what CPU string limits support to
SSE, but that should do it.

cheers,
--renato

.. forcing the vector size to 4 does not prevent using AVX. I just hit the following:

LV: We can vectorize this loop!
LV: Found trip count: 4
LV: The Widest type: 64 bits.
LV: The Widest register is: 256 bits.
LV: Using user VF 4.

Looks like I have to disable AVX somehow. (Which is sad on its own.)

Frank

Sure. That's more for tests than anything else.

So, there are ways of disabling stuf in Clang, for instance "-mattr=-avx"
or "-target-feature -avx", but I'm not sure how you're doing it in the JIT.
I'm also not sure how to set target parameters in JIT, you'll have to do
that by hand.

cheers,
--renato

I don’t know that either. I set the CPU via engineBuilder.setMCPU(llvm::sys::getHostCPUName()); and that figures out all target parameters, I assume. I would need to still use this, and then disable just the AVX feature.

Try:

engineBuilder.setMAttrs("-avx");

--renato

Well educated guess! (It must be a sequence container of strings, but that's technical.)

Thanks,
Frank

Porting my project from JIT to MCJIT did not fix the code generation bug
Frank is also experiencing. However, Renato's "-avx" suggestion did resolve
the issue for me. Hopefully we can get some traction on this bug, happy to
help where possible!
v/r,
Josh

Hi Josh, Frank,

Glad to see you can continue with your work, regardless of the AVX bug. It
would be great if you guys could reduce the IR and report the AVX bug in
bugzilla, I'm hoping you both found the same error (fingers crossed), but
feel free to open separate bugs, and we'll join later if they are the same.

Thanks!
--renato

My case is submitted. bug 17878 <http://llvm.org/bugs/show_bug.cgi?id=17878>

In my case the segfault happens when calling the JIT'ed function. Thus some sort of 'payload' has to be created. Not sure if it's the same what Josh is hitting.

Frank

Thanks!

--renato

I'm embarrassed to say my bug ended up being a user error. I was passing in
pointers that were 16-byte aligned instead of 32. Explains why they worked
fine for SSE but not AVX :slight_smile: Sorry for the noise!

Good catch! That was the problem in my case too. I totally
overlooked the alignment requirement for AVX.
Frank

Wow! Two bugs closed without even looking at them! I must be a wizard! :smiley:

Good work Josh, thanks for letting us know.

cheers,
--renato

I wonder if the validation mechanism shouldn't have caught it earlier... Do
you guys run validate on the modules before JIT-ing?

--renato

Hmm.. I don't quite understand. How can a module validator
catch this, when it's the pointers, i.e. the payload, you pass
as function arguments that need to be aligned.. ?!
Frank

Agreed, is there a pass that will insert a runtime alignment check? Also, what’s the easiest way to get at TargetTransformInfo::getRegisterBitWidth() so I don’t have to hard code 32? Thanks!
-Josh

My mistake, I thought it was something in your front-end, generating bad
IR. Ignore that comment.

--renato

I think that's a fair question, and it's about safety. If you're getting
this on the JIT, means we may be generating unsafe transformations on the
vectorizer.

Arnold, Nadav, I don't remember seeing code to generate any run-time
alignment checks on the incoming pointer, is there such a thing? If not,
shouldn't we add one?

cheers,
--renato