vectorize.enable

Hi Michael and Florian,
( + llvm-dev for visibility)

I would like to quickly follow up on “Pragma vectorize_width() implies vectorize(enable)”,
which got reverted with commit 858a1ae for 2 reasons, see also that revert commit message. Ignore the assert, that’s been fixed now.

The other thing is that with the patch behaviour is slightly changed and we could get a diagnostic we didn’t get before:

warning: loop not vectorized: the optimizer was unable to

perform the requested transformation; the transformation might be disabled or

specified as part of an unsupported transformation ordering

[-Wpass-failed=transform-warning]

For the example given in revert 858a1ae, in both cases before and after my commit, the loop vectoriser was bailing because “Not vectorizing: The exiting block is not the loop latch”. But the difference is that vectorize_width() now implies vectorize(enable), and so this is now marked as forced vectorisation which wasn’t the case before. Because of this forced vectorization, and that the transformation wasn’t applied, we now emit this diagnostic. The first part of this diagnostic is spot on: “the optimizer was unable to perform the requested transformation”. We could argue about the suggestions given as to why the transformations didn’t happen in this case, but overall I think this is an improvement.

I just wanted to check if we are happy with this behaviour? Okay to recommit?

Cheers,
Sjoerd.

The additional warning makes sense to me and I think is also beneficial to the user.

Before, we silently ignored vector_width() in the example [1] and I suppose the user was expecting vectorize_width(4) to be honored. Now we are more transparent in informing the user what is happening: we were not able to honor their requested pragma and I assume they would be interested in knowing.

But I think it would be good for the warning to also mention that the requested transformation might not be legal (which is the case for building [1] with -Oz). This makes it a little better.

Hans, as you reverted the patch, is the warning (modulo wording) in line with what you would expect?

Cheers,
Florian

[1]

float ScaleSumSamples_C(const float* src, float* dst, float scale, int width) {
float fsum = 0.f;
int i;
#if defined(clang)
#pragma clang loop vectorize_width(4)
#endif
for (i = 0; i < width; ++i) {
float v = *src++;
fsum += v * v;
*dst++ = v * scale;
}
return fsum;
}

The other thing is that with the patch behaviour is slightly changed and we could get a diagnostic we didn't get before:

warning: loop not vectorized: the optimizer was unable to
      perform the requested transformation; the transformation might be disabled or
      specified as part of an unsupported transformation ordering
      [-Wpass-failed=transform-warning]

For the example given in revert 858a1ae, in both cases before and after my commit, the loop vectoriser was bailing because "Not vectorizing: The exiting block is not the loop latch".

The source looks like a straightforward canonical loop. What pass
transformed it to have code between the exiting block and the latch?

But the difference is that vectorize_width() now implies vectorize(enable), and so this is now marked as forced vectorisation which wasn't the case before. Because of this forced vectorization, and that the transformation wasn't applied, we now emit this diagnostic. The first part of this diagnostic is spot on: "the optimizer was unable to perform the requested transformation". We could argue about the suggestions given as to why the transformations didn't happen in this case, but overall I think this is an improvement.

The patch that added the warning as-is was by me
(⚙ D55288 [test] Fix tests for changed optimizer warning message). It changed to emitted message from
"loop not vectorized: failed explicitly specified loop vectorization"
to the lengthier description following review feedback.

It's done by the WarnMissedTransformation and just looks for
transformation metadata that is still in the IR after all passes that
should have transformed them have ran. That is, it does not know why
it is still there -- it could be because the LoopVectorize pass is not
even in the pipeline -- and we cannot be more specific in the message.
However, -Rpass-missed=loop-vectorize may give more information.

The additional warning makes sense to me and I think is also beneficial to the user.

Before, we silently ignored vector_width() in the example [1] and I suppose the user was expecting vectorize_width(4) to be honored. Now we are more transparent in informing the user what is happening: we were not able to honor their requested pragma and I assume they would be interested in knowing.

As already mentioned, the loop indeed was never vectorized. It is
again the question whether vectorize_width(4) means "vectorize with
simd length 4" or "if this is vectorized, use simd length 4". Since
the decided on the former (which is also what the docs say), the
warning is correct.

That is, IMHO, Chrome should either remove the #pragma (since it has
no effect), add -Wno-pass-failed. We could also wait for the
LoopVectorize pass to support this, Philip Reames is currently working
on it.

Michael

The other thing is that with the patch behaviour is slightly changed and we could get a diagnostic we didn't get before:

warning: loop not vectorized: the optimizer was unable to
       perform the requested transformation; the transformation might be disabled or
       specified as part of an unsupported transformation ordering
       [-Wpass-failed=transform-warning]

For the example given in revert 858a1ae, in both cases before and after my commit, the loop vectoriser was bailing because "Not vectorizing: The exiting block is not the loop latch".

The source looks like a straightforward canonical loop. What pass
transformed it to have code between the exiting block and the latch?

But the difference is that vectorize_width() now implies vectorize(enable), and so this is now marked as forced vectorisation which wasn't the case before. Because of this forced vectorization, and that the transformation wasn't applied, we now emit this diagnostic. The first part of this diagnostic is spot on: "the optimizer was unable to perform the requested transformation". We could argue about the suggestions given as to why the transformations didn't happen in this case, but overall I think this is an improvement.

The patch that added the warning as-is was by me
(⚙ D55288 [test] Fix tests for changed optimizer warning message). It changed to emitted message from
"loop not vectorized: failed explicitly specified loop vectorization"
to the lengthier description following review feedback.

It's done by the WarnMissedTransformation and just looks for
transformation metadata that is still in the IR after all passes that
should have transformed them have ran. That is, it does not know why
it is still there -- it could be because the LoopVectorize pass is not
even in the pipeline -- and we cannot be more specific in the message.
However, -Rpass-missed=loop-vectorize may give more information.

As I recall, there is some trade-off here because it's hard for a
transformation to know that it's last - either the last run of that
particular transformation in the pipeline or the last transformation in
the pipeline that can service a particular transformation request (and
this is especially true if there are multiple, separated pipelines
involved, such as in LTO). This is why we did not have transformations
warn if they can't perform the requested transformation for structural
reasons - maybe they will be able to later. However, we also should
improve the diagnostics in these cases.

I recommend that we consider taking a kind of delayed-diagnostic
approach. When a pass cannot perform the requested transformation, it
records some rationale into the metadata. That rationale can be reported
by WarnMissedTransformation, if available, to make the diagnostic more
helpful. If the transformation is later actually performed, then the
extra information is discarded along with the transformation metadata
itself.

-Hal

> It's done by the WarnMissedTransformation and just looks for
> transformation metadata that is still in the IR after all passes that
> should have transformed them have ran. That is, it does not know why
> it is still there -- it could be because the LoopVectorize pass is not
> even in the pipeline -- and we cannot be more specific in the message.
> However, -Rpass-missed=loop-vectorize may give more information.

As I recall, there is some trade-off here because it's hard for a
transformation to know that it's last - either the last run of that
particular transformation in the pipeline or the last transformation in
the pipeline that can service a particular transformation request (and
this is especially true if there are multiple, separated pipelines
involved, such as in LTO). This is why we did not have transformations
warn if they can't perform the requested transformation for structural
reasons - maybe they will be able to later. However, we also should
improve the diagnostics in these cases.

This was not the only consideration. With ordered transformations,
such as vectorize after unroll-and-jam, the LoopVectorize does not
even have a chance to analyze the code since it is located after the
LoopUnrollAndJam pass. We would still warn that vectorization has not
been performed.

I recommend that we consider taking a kind of delayed-diagnostic
approach. When a pass cannot perform the requested transformation, it
records some rationale into the metadata. That rationale can be reported
by WarnMissedTransformation, if available, to make the diagnostic more
helpful. If the transformation is later actually performed, then the
extra information is discarded along with the transformation metadata
itself.

I like the idea, but I am not sure how helpful are messages such as
"The exiting block is not the loop latch" or "Cannot identify array
bounds" are to the end user. It would still be an improvement.

If there is no diagnostic metadata, do we keep emitting the current
message? If there already is an explanation metadata, does the new one
override the old one or is it appended?

We could also just a hint to the diagnostic such as
"-Rpass=loop-vectorize may provide more information".

Michael

It's done by the WarnMissedTransformation and just looks for
transformation metadata that is still in the IR after all passes that
should have transformed them have ran. That is, it does not know why
it is still there -- it could be because the LoopVectorize pass is not
even in the pipeline -- and we cannot be more specific in the message.
However, -Rpass-missed=loop-vectorize may give more information.

As I recall, there is some trade-off here because it's hard for a
transformation to know that it's last - either the last run of that
particular transformation in the pipeline or the last transformation in
the pipeline that can service a particular transformation request (and
this is especially true if there are multiple, separated pipelines
involved, such as in LTO). This is why we did not have transformations
warn if they can't perform the requested transformation for structural
reasons - maybe they will be able to later. However, we also should
improve the diagnostics in these cases.

This was not the only consideration. With ordered transformations,
such as vectorize after unroll-and-jam, the LoopVectorize does not
even have a chance to analyze the code since it is located after the
LoopUnrollAndJam pass. We would still warn that vectorization has not
been performed.

I recommend that we consider taking a kind of delayed-diagnostic
approach. When a pass cannot perform the requested transformation, it
records some rationale into the metadata. That rationale can be reported
by WarnMissedTransformation, if available, to make the diagnostic more
helpful. If the transformation is later actually performed, then the
extra information is discarded along with the transformation metadata
itself.

I like the idea, but I am not sure how helpful are messages such as
"The exiting block is not the loop latch" or "Cannot identify array
bounds" are to the end user. It would still be an improvement.

It probably wouldn't be, but might encourage some more-useful bug reports.

If there is no diagnostic metadata, do we keep emitting the current
message?

That was my thought. If we don't know anything else, just keep doing
what we're doing.

  If there already is an explanation metadata, does the new one
override the old one or is it appended?

Lacking data, I'll say append (because why throw away information?), but
only print the last one by default (because, for example, if a pass runs
multiple times, what I probably care about it why it couldn't perform
the transformation during the last run, not the previous times when the
IR might have been in a less-optimizable state) - that's probably less
confusing.

We could also just a hint to the diagnostic such as
"-Rpass=loop-vectorize may provide more information".

Agreed. That is probably the easiest thing, and might be the most useful
as well. We just need a bit in the metadata to provide some information
on whether the transformation was attempted but failed.

-Hal

Thanks for your replies. That was a very useful discussion.
I won’t recommit on a Friday afternoon, but will do on Monday, as it looks like we agreed again on the direction and the change.
Orthogonal to this change, the interesting topics brought up are improved diagnostics, and the cases the vectoriser misses. I will briefly look why this particular case isn’t vectorised, but I suspect that it’s a simple case of some prep / clean-up passes not running at Oz.

Cheers,
Sjoerd.

Thanks for your replies. That was a very useful discussion.

I won’t recommit on a Friday afternoon, but will do on Monday, as it looks like we agreed again on the direction and the change.

Sound good to me.

Orthogonal to this change, the interesting topics brought up are improved diagnostics, and the cases the vectoriser misses. I will briefly look why this particular case isn’t vectorised, but I suspect that it’s a simple case of some prep / clean-up passes not running at Oz.

I think the problem is that loop-rotate skips the loop at -Oz and loop-vectorize requires loops to exit via the latch.

Cheers,
Florian

Sorry for coming late to this thread, I was out a few days.

I wasn't familiar with this pragma before, but if the intended meaning
is indeed "vectorize this" and not "if vectorizing, use this width",
then the change makes sense.

The problem I see is that the warning isn't very actionable. Good
warnings are supposed to be actionable, but what is the developer
supposed to do in this case? The code looks perfectly set up for
vectorization, but the compiler doesn't vectorize. What's the fix for
the code?

I think the thread mentioned that someone was working on fixing the
vectorization for this case. Maybe we should wait for that?

(Another question is what we should do at -O0, because there I think
we still don't warn. Does that mean the pragma doesn't have the same
meaning at -O0?)

Thanks,
Hans

Hi,

The problem I see is that the warning isn’t very actionable.

Fully agreed.

Good warnings are supposed to be actionable, but what is the developer supposed to do in this case?

This diagnostic is unclear. But to be more precise, the first part says the optimisation could not be performed. This is spot on, and an improvement of what we had before because that didn’t issue any diagnostic at all, so one could falsely be under the impression the transformation had actually been applied.
The second part of the diagnostic, the suggestion is unclear / is not applicable. This needs fixing.

The “workaround” is that extra optimisation remarks can be requested, that will tell in more detail what and what didn’t happen.

The code looks perfectly set up for vectorization, but the compiler doesn’t vectorize. What’s the fix for the code?

Not a strong argument but this is perhaps a bit of an edge case because this is vectorisation with -Oz; this problem won’t appear (or it will less) with higher opt levels. In this particular case, as Florian mentiond, it’s looprotate not doing something. But forgetting this for a moment, as I also mentioned earlier, I think there are a few things that are orthogonal to this particular change:

  • diagnostics can be improved (what Hal and Michael brought up and discussed)
  • the vectoriser can be made a bit smarter (or looprotate, as Florian mentioned, and someone is working on that)

Another question is what we should do at -O0, because there I think we still don’t warn.

I will look into this. I was hoping we would issue exactly the same warning, which in this case would be correct: the transformation didn’t run.

Cheers,
Sjoerd.

Hi,

> The problem I see is that the warning isn't very actionable.

Fully agreed.

> Good warnings are supposed to be actionable, but what is the developer supposed to do in this case?

This diagnostic is unclear. But to be more precise, the first part says the optimisation could not be performed. This is spot on, and an improvement of what we had before because that didn't issue any diagnostic at all, so one could falsely be under the impression the transformation had actually been applied.
The second part of the diagnostic, the suggestion is unclear / is not applicable. This needs fixing.

The "workaround" is that extra optimisation remarks can be requested, that will tell in more detail what and what didn't happen.

> The code looks perfectly set up for vectorization, but the compiler doesn't vectorize. What's the fix for the code?

Not a strong argument but this is perhaps a bit of an edge case because this is vectorisation with -Oz; this problem won't appear (or it will less) with higher opt levels. In this particular case, as Florian mentiond, it's looprotate not doing something. But forgetting this for a moment, as I also mentioned earlier, I think there are a few things that are orthogonal to this particular change:

diagnostics can be improved (what Hal and Michael brought up and discussed)
the vectoriser can be made a bit smarter (or looprotate, as Florian mentioned, and someone is working on that)

> Another question is what we should do at -O0, because there I think we still don't warn.

I will look into this. I was hoping we would issue exactly the same warning, which in this case would be correct: the transformation didn't run.

Yes it will be correct, but again it's not entirely clear what action
the user should take. Should one always use this #pragma with some
#ifdef to check optimization level? I don't mean to complain, I'm just
not sure what the answer is to these problems.

The problem I see is that the warning isn't very actionable. Good
warnings are supposed to be actionable, but what is the developer
supposed to do in this case? The code looks perfectly set up for
vectorization, but the compiler doesn't vectorize. What's the fix for
the code?

Unfortunately, the problem is too complex that we could explain in a
single warning line.

Actionable warnings are nicer, but I don't think we can only emit
warnings if they are actionable.

I think the thread mentioned that someone was working on fixing the
vectorization for this case. Maybe we should wait for that?

This would fix one source of this warnings, but unfortunately there
are many reasons why vectorization can fail.

(Another question is what we should do at -O0, because there I think
we still don't warn. Does that mean the pragma doesn't have the same
meaning at -O0?)

-O0 skips the entire optimization pass pipeline, i.e. the code is
never vectorized.

Michael

These warnings depend on optimization options, toolchain (e.g.
LTO,PGO) and target platform. That is, the warning may appear in many
circumstances that have not been tested by an upsteam project. My
suggestion to compile without warning would be:
If the vectorization is not very important, remove the pragma and
leave the decision to the compiler.
If code size in -Oz is more important than vectorization, add
-Wno-pass-failed to the command line in that configuration.
If vectorization is critical (it seems not to since the #pragma is
already guarded by an #if), I see no other way than maintaining a list
of supported configurations or manual vectorization. The vector width
is target-specific already.

This kind of warning has been introduced in r213110, r213112 [1,2].
Unfortunately I have not found a discussion about the unpredictable
nature of forced transformation failure warnings. Maybe we have one
now.

Michael

[1] https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140714/225803.html
[2] https://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20140714/110132.html