loop vectorizer disabling

I would like to propose that loop pragma vectorize(disable) actually means disabling the vectorizer for that loop. This perhaps sounds really obvious (I hope it does), but currently vectorize(disable) sets the vectorization width to 1, and that means the vectorizer will run and could perform other tricks such as interleaving. The main reason to change the behaviour is that it will be more what (most) users would expect.

I think we reached consensus on changing the behaviour in [4], but since this is changing the behaviour of a user-facing pragma, we would like to know if there are any objections. If people rely on the current behaviour that vectorize(disable) will do some other transformations performed by the vectorizer, which I hope is an edge case, I think this is a trivial rewrite, but still a change. So, again, without objections we would like to go ahead with this.

For a little bit more context, this is a follow-up of a discussion on loop pragmas we had not so long ago. We added a new pragma [1], and enabling this new transformation option implies setting the transformation [2]. This is something that our docs promise for other transformation options too, except that this wasn’t happening and so we started fixing that. In [3] for example, we implement that vectorize_width() implies vectorize(enable). Related to this, we started discussing in [4] what vectorize(disable) should mean easier of [3], because it makes implementation easier but more importantly because that would probably match user expectations better.

[1] https://reviews.llvm.org/D64744

[2] https://reviews.llvm.org/D65776
[3] https://reviews.llvm.org/D66290

[4] https://reviews.llvm.org/D66796

Yes, I agree – this is exactly what should happen.

Hi, Sjoerd,

Thanks for posting this RFC. I do, however, strongly disagree. From an implementation standpoint, it made sense to have a simple component, the loop vectorizer, perform not only loop vectorization but also related transformations - specifically, interleaving. However, the fact that a single component of the optimizer performs these different transformation is not something that we expose to users. Moreover, the optimizer is always free to use cost-model-based heuristics to perform any transformation unless specifically directed. Thus, disabling vectorization should not disable interleaving. Although often useful together, these are separate transformations, interleaving is often useful in the absence of vectorization, and we have a separate set of pragmas to control interleaving (interleave(enable/disable), interleave_count(N), etc.). Having vectorize(disable) imply interleave(disable) is unnecessary and confusing. I expect that my users will complain if we make this change (and, relatedly, we’ll see performance regressions).

In short, our pragmas should control transformations, not components.

Thanks again,

Hal

Hi Hal,

Many thanks for commenting and clarifying. I liked “pragmas should control transformations, not components” and it’s difficult to disagree with that I think. :slight_smile:

I considered the interleaver an integral part of the vectorizer, a prep step or enabler for efficient vectorisation. With that assumption, my proposal made sense but if the interleaver is a separate optimisation then I don’t think I have a case.

I will drop the “vectorize(disable) implies interleave(disable)” work, and finish “vectorize_width() implies vectorize(enable)” to conclude my loop pragma adventure.

Not relevant for this discussion, but I don’t it is difficult to get a bit confused about the interleaver because of both the implementation and documentation. Clarifying things a bit more in the docs is probably low hanging fruit; I will review if that is true.

Thanks,
Sjoerd.

Hi Hal,

Many thanks for commenting and clarifying. I liked "pragmas should control transformations, not components" and it's difficult to disagree with that I think. :slight_smile:

I considered the interleaver an integral part of the vectorizer, a prep step or enabler for efficient vectorisation. With that assumption, my proposal made sense but if the interleaver is a separate optimisation then I don't think I have a case.

Thanks! Interleaving is certainly an optimization that we've found to be useful even in the absence of vectorization (because it helps hide the latency of long, pipelined operations).

I will drop the "vectorize(disable) implies interleave(disable)" work, and finish "vectorize_width() implies vectorize(enable)" to conclude my loop pragma adventure.

Not relevant for this discussion, but I don't it is difficult to get a bit confused about the interleaver because of both the implementation and documentation. Clarifying things a bit more in the docs is probably low hanging fruit; I will review if that is true.

+1 for better documentation, of course. :slight_smile: -- Thanks again for helping to improve all of this.

-Hal

Thanks,
Sjoerd.