clang auto-vectorization feedback is confusing

Hi all,

I'm experimenting with clang's ability to auto-vectorize loops and am having trouble knowing when it succeeds or not.

Consider a stupid example reduced from real code:

--------test.c-------
#include <stdio.h>

int foo(int *A, int *B, int n) {

#pragma clang loop vectorize(enable)
  for (int k = 0; k < n; ++k)
  {
    unsigned sum = 0;

#pragma clang loop vectorize(enable)
    for (int i = 0; i < n; ++i)
       sum += A[i] + 5;

    printf("%i", sum);
  }

return 0;
}

Hi all,

I'm experimenting with clang's ability to auto-vectorize loops and am having trouble knowing when it succeeds or not.

Consider a stupid example reduced from real code:

--------test.c-------
#include <stdio.h>

int foo(int *A, int *B, int n) {

#pragma clang loop vectorize(enable)
  for (int k = 0; k < n; ++k)
  {
    unsigned sum = 0;

#pragma clang loop vectorize(enable)
    for (int i = 0; i < n; ++i)
       sum += A[i] + 5;

    printf("%i", sum);
  }

return 0;
}
---------------

$ clang -S -fvectorize -O3 -Rpass=loop-vectorize -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize ~/Desktop/test.c
/Users/sean/Desktop/test.c:11:4: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
         for (int i = 0; i < n; ++i)
         ^

It helpfully says it vectorized the inner loop, but says *nothing* about the outer loop.

Currently the vectorizer does not visit outer loops and the diagnostics is emitted in the vectorizer so we don’t get a chance to report anything. Can you please file a bug?

If I comment out the printf(), then it outputs nothing whatsoever. Is this because earlier optimizer passes are basically destroying/transforming away the loop before the vectorizer sees it?

Sort of, it’s because the loops are removed. I am not sure what the expected behavior should be in this case...

I'm trying to rework some code to be vectorizable, and various tweaks are causing clang to become silent or noisy. Any suggestions on how I can reliably know if a loop will be vectorized, or reliably know the loop was transformed to something else?

I am not sure there is anything beyond what you’re doing right now. You could look at the internal debug dumps with a debug version of clang but probably the best thing to do is to file PRs on anything that does not make sense to you.

Thanks for reporting.

Adam

It helpfully says it vectorized the inner loop, but says *nothing*

about the outer loop.

Currently the vectorizer does not visit outer loops and the diagnostics
is emitted in the vectorizer so we don’t get a chance to report
anything. Can you please file a bug?

Done, see:
<28374 – -Rpass-missed=loop-vectorize says nothing at all for outer loops;
<28377 – Mere addition of template makes -Rpass-analysis=loop-vectorize fall silent;

If I comment out the printf(), then it outputs nothing whatsoever.

Is this because earlier optimizer passes are basically destroying/
transforming away the loop before the vectorizer sees it?

Sort of, it’s because the loops are removed. I am not sure what the
expected behavior should be in this case...

Well, difficulties aside, from my perspective, it'd be nice if, for every loop in a compiled file, if clang could tell me:
- this loop _was_ vectorized
- this loop was _not_ vectorized because xyz
- this loop was removed/optimized away/transformed/whatever

As it stands, one is left guessing in the last case.

I am not sure there is anything beyond what you’re doing right now. You
could look at the internal debug dumps with a debug version of clang but
probably the best thing to do is to file PRs on anything that does not
make sense to you.

Thanks for the suggestion. Relatedly, when faced with a diagnostic like "Loop not vectorized: unsafe dependent memory operations in loop", is there any reference that can help a layman understand what this is telling me exactly? I STFW without many results.

Thanks for reporting.

Also filed these:

Poor -Rpass-analysis=loop-vectorize diagnostic message for (acceptable) failure to vectorize Objective-C fast enumeration loops
<28375 – Poor -Rpass-analysis=loop-vectorize diagnostic message for (acceptable) failure to vectorize Objective-C fast enumeration loops;
<rdar://27112452>>

-Rpass-missed=loop-vectorize says to use -Rpass-analysis=loop-vectorize even if I already am
<28376 – -Rpass-missed=loop-vectorize says to use -Rpass-analysis=loop-vectorize even if I already am;

Cheers,

Sean

It helpfully says it vectorized the inner loop, but says *nothing*

about the outer loop.

Currently the vectorizer does not visit outer loops and the diagnostics
is emitted in the vectorizer so we don’t get a chance to report
anything. Can you please file a bug?

Done, see:
<28374 – -Rpass-missed=loop-vectorize says nothing at all for outer loops;
<28377 – Mere addition of template makes -Rpass-analysis=loop-vectorize fall silent;

If I comment out the printf(), then it outputs nothing whatsoever.

Is this because earlier optimizer passes are basically destroying/
transforming away the loop before the vectorizer sees it?

Sort of, it’s because the loops are removed. I am not sure what the
expected behavior should be in this case...

Well, difficulties aside, from my perspective, it'd be nice if, for every loop in a compiled file, if clang could tell me:
- this loop _was_ vectorized
- this loop was _not_ vectorized because xyz
- this loop was removed/optimized away/transformed/whatever

As it stands, one is left guessing in the last case.

I think that’s fair. Can you please file a bug with the loop removal as well? Loop removal could check for these pragmas and issue diagnostics.

I am not sure there is anything beyond what you’re doing right now. You
could look at the internal debug dumps with a debug version of clang but
probably the best thing to do is to file PRs on anything that does not
make sense to you.

Thanks for the suggestion. Relatedly, when faced with a diagnostic like "Loop not vectorized: unsafe dependent memory operations in loop", is there any reference that can help a layman understand what this is telling me exactly? I STFW without many results.

On trunk we now emit:

"unsafe dependent memory operations in loop. Use #pragma loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop”

And then the pragma is documented with an example under Clang Language Extensions — Clang 18.0.0git documentation

That said, it still does not really explain what unsafe dependences mean. Feel free to file another bug for this.

Thanks,
Adam

Well, difficulties aside, from my perspective, it'd be nice if, for

every loop in a compiled file, if clang could tell me:

- this loop _was_ vectorized
- this loop was _not_ vectorized because xyz
- this loop was removed/optimized away/transformed/whatever

As it stands, one is left guessing in the last case.

I think that’s fair. Can you please file a bug with the loop removal as
well? Loop removal could check for these pragmas and issue diagnostics.

Done:
<28415 – optimizer loop removal should issue diagnostics in presence of "#pragma clang loop vectorize";

Thanks for the suggestion. Relatedly, when faced with a diagnostic

like "Loop not vectorized: unsafe dependent memory operations in loop",
is there any reference that can help a layman understand what this is
telling me exactly? I STFW without many results.

On trunk we now emit:

"unsafe dependent memory operations in loop. Use #pragma loop
distribute(enable) to allow loop distribution to attempt to isolate the
offending operations into a separate loop”

And then the pragma is documented with an example under http://
clang.llvm.org/docs/LanguageExtensions.html#loop-distribution

Thanks for pointing that stuff out, I had not seen it before.

That said, it still does not really explain what unsafe dependences
mean. Feel free to file another bug for this.

Done:
<28475 – When clang fails to vectorize a loop, it should be more descriptive/helpful about why not;

Having something like this would be nice:
<https://blogs.msdn.microsoft.com/nativeconcurrency/2012/05/22/auto-vectorizer-in-visual-studio-2012-did-it-work/&gt;

Another question if I may: sometimes clang will output something pretty clear like:

"remark: loop not vectorized: loop control flow is not understood by vectorizer [-Rpass-analysis=loop-vectorize]"

Other times I'll see:

"remark: the cost-model indicates that interleaving is not beneficial [-Rpass-analysis=loop-vectorize]"

From the sounds of it, that's telling me that the loop was *not* vectorized, right? If so, I find it odd that the "loop not vectorized:" prefix is not present.

Cheers,

Sean

Well, difficulties aside, from my perspective, it'd be nice if, for

every loop in a compiled file, if clang could tell me:

- this loop _was_ vectorized
- this loop was _not_ vectorized because xyz
- this loop was removed/optimized away/transformed/whatever

As it stands, one is left guessing in the last case.

I think that’s fair. Can you please file a bug with the loop removal as
well? Loop removal could check for these pragmas and issue diagnostics.

Done:
<28415 – optimizer loop removal should issue diagnostics in presence of "#pragma clang loop vectorize";

Thanks for the suggestion. Relatedly, when faced with a diagnostic

like "Loop not vectorized: unsafe dependent memory operations in loop",
is there any reference that can help a layman understand what this is
telling me exactly? I STFW without many results.

On trunk we now emit:

"unsafe dependent memory operations in loop. Use #pragma loop
distribute(enable) to allow loop distribution to attempt to isolate the
offending operations into a separate loop”

And then the pragma is documented with an example under http://
clang.llvm.org/docs/LanguageExtensions.html#loop-distribution

Thanks for pointing that stuff out, I had not seen it before.

That said, it still does not really explain what unsafe dependences
mean. Feel free to file another bug for this.

Done:
<28475 – When clang fails to vectorize a loop, it should be more descriptive/helpful about why not;

Having something like this would be nice:
<https://blogs.msdn.microsoft.com/nativeconcurrency/2012/05/22/auto-vectorizer-in-visual-studio-2012-did-it-work/&gt;

Another question if I may: sometimes clang will output something pretty clear like:

"remark: loop not vectorized: loop control flow is not understood by vectorizer [-Rpass-analysis=loop-vectorize]"

Other times I'll see:

"remark: the cost-model indicates that interleaving is not beneficial [-Rpass-analysis=loop-vectorize]"

From the sounds of it, that's telling me that the loop was *not* vectorized, right? If so, I find it odd that the "loop not vectorized:" prefix is not present.

Yes and yes. Can you please file a bug report on this too?

Thanks,
Adam

Thanks for confirming. Here's the ticket:
<https://llvm.org/bugs/show_bug.cgi?id=28477&gt;

Cheers,

Sean