RFC: Removal of noduplicate attribute

Hi,

Can the noduplicate attribute be deleted? It was originally added to satisfy the constraints of OpenCL barriers, but it didn’t really solve this problem. Convergent is strictly better and more accurately captures the real restriction. Is anyone still using noduplicate or really needs the semantics of noduplicate? There are a number of passes that now respect convergent, but do not try to handle noduplicate properly. There is a maintenance cost to keeping it around, and does require work to keep it working as advertised.

A related question is if it’s OK to recycle the attribute encoding bit, or if this will be a problematic bitcode compatibility change.

-Matt

Hi,

Can the noduplicate attribute be deleted? It was originally added to satisfy the constraints of OpenCL barriers, but it didn’t really solve this problem. Convergent is strictly better and more accurately captures the real restriction. Is anyone still using noduplicate or really needs the semantics of noduplicate? There are a number of passes that now respect convergent, but do not try to handle noduplicate properly. There is a maintenance cost to keeping it around, and does require work to keep it working as advertised.

The noduplicate attribute is user-facing and is not tied to OpenCL
(there are no constraints for it, anyway), so I would be suspicious of
removing it just because it doesn't satisfy OpenCL needs -- users may
be using it for other purposes. It may be reasonable to deprecate it
if there is a better user-facing option for users to use and
eventually remove the attribute when we think it's safe to do so, but
I think it's a bit early to talk about removal.

~Aaron

>
> Hi,
>
> Can the noduplicate attribute be deleted? It was originally added to
> satisfy the constraints of OpenCL barriers, but it didn’t really
> solve this problem. Convergent is strictly better and more
> accurately captures the real restriction. Is anyone still using
> noduplicate or really needs the semantics of noduplicate? There are
> a number of passes that now respect convergent, but do not try to
> handle noduplicate properly. There is a maintenance cost to keeping
> it around, and does require work to keep it working as advertised.

The noduplicate attribute is user-facing and is not tied to OpenCL
(there are no constraints for it, anyway), so I would be suspicious of
removing it just because it doesn't satisfy OpenCL needs -- users may
be using it for other purposes. It may be reasonable to deprecate it
if there is a better user-facing option for users to use and
eventually remove the attribute when we think it's safe to do so, but
I think it's a bit early to talk about removal.

These are good points. I think the first question should be: Do we know
of any active users of this attribute right now? If not, deprecation
seems like something we could do, e.g., through a warning in clang and
in the middle-end to ensure other front-ends are aware of it as well.

> A related question is if it’s OK to recycle the attribute encoding
> bit, or if this will be a problematic bitcode compatibility change.

I doubt it is all that helpful to do so but potentially problematic. We
can support a variable number now and we already crossed the critical
threshold so I would, for now, argue against reusing old bits.

Cheers,
  Johannes

The QoI isn’t great for it, and we aren’t seeing any bug reports involving it so it doesn’t seem like it’s really in use. The only mention of the attribute I found on bugzilla is a comment mentioning that it exists as possibly being relevant to solving another problem.

Similar to how clang tries to force convergent on all functions, noduplicate would have to be speculatively added to all functions if we were serious about preserving the semantics. As-is calls to noduplicate functions could be effectively duplicated by duplicating a call to a function that calls a noduplicate function, or a call to a noduplicate through an indirect call.

-Matt

Indeed, the lack of clear use cases and even of clear semantics of
noduplicate is a concern. I would be in favor of removing it outright,
but if there are concerns then deprecating it in order to shake out
any users it may have is the least we should do.

Cheers,
Nicolai

> >
> > Hi,
> >
> > Can the noduplicate attribute be deleted? It was originally added to
> > satisfy the constraints of OpenCL barriers, but it didn’t really
> > solve this problem. Convergent is strictly better and more
> > accurately captures the real restriction. Is anyone still using
> > noduplicate or really needs the semantics of noduplicate? There are
> > a number of passes that now respect convergent, but do not try to
> > handle noduplicate properly. There is a maintenance cost to keeping
> > it around, and does require work to keep it working as advertised.
>
> The noduplicate attribute is user-facing and is not tied to OpenCL
> (there are no constraints for it, anyway), so I would be suspicious of
> removing it just because it doesn't satisfy OpenCL needs -- users may
> be using it for other purposes. It may be reasonable to deprecate it
> if there is a better user-facing option for users to use and
> eventually remove the attribute when we think it's safe to do so, but
> I think it's a bit early to talk about removal.

These are good points. I think the first question should be: Do we know
of any active users of this attribute right now? If not, deprecation
seems like something we could do, e.g., through a warning in clang and
in the middle-end to ensure other front-ends are aware of it as well.

Noduplicate attribute is still used by the Intel OpenCL Compiler for CPU.
The main use case is to prevent loop unroll when an OpenCL barrier is called
within a loop. Although such loop can be unrolled and keep its semantic intact, but
this introduces a lot of distinct barrier calls, and each of them has to
be handled separately.

In other words, "noduplicate" serves as a hint to not unroll a loop if a
certain function is called in a loop body.

Do you properly annotate all functions in the call chain with `noduplicate` or only barriers as a "hint"?
I mean what the loop below, would it be unrolled or is `noduplicate` also applied to `foo`?

void foo() { barrier(); }

for (...) {
  // some code
  foo() ;
}

Cheers,
  Johannes

Do you properly annotate all functions in the call chain with `noduplicate` or only barriers as a "hint"?
I mean what the loop below, would it be unrolled or is `noduplicate` also applied to `foo`?

void foo() { barrier(); }

for (...) {
  // some code
  foo() ;
}

Cheers,
  Johannes

Yes, `noduplicate' is propagated up the call chain, so `foo' should also
be marked as `noduplicate'.

I don't quite understand the reasoning behind this. Is it because your
backend turns each individual barrier call into a large chunk of code?

If so, would it be a long-term viable alternative to inform the
various code size heuristics about this instead of using
`noduplicate`?

Cheers,
Nicolai

> > These are good points. I think the first question should be: Do we know
> > of any active users of this attribute right now? If not, deprecation
> > seems like something we could do, e.g., through a warning in clang and
> > in the middle-end to ensure other front-ends are aware of it as well.
>
> Noduplicate attribute is still used by the Intel OpenCL Compiler for CPU.
> The main use case is to prevent loop unroll when an OpenCL barrier is called
> within a loop. Although such loop can be unrolled and keep its semantic intact, but
> this introduces a lot of distinct barrier calls, and each of them has to
> be handled separately.
>
> In other words, "noduplicate" serves as a hint to not unroll a loop if a
> certain function is called in a loop body.

I don't quite understand the reasoning behind this. Is it because your
backend turns each individual barrier call into a large chunk of code?

It is even worse than just a large chunk of code: in order to support OpenCL
barrier on CPU (at least in our implementation) we have to significantly change
control flow across the entire call chain. Evgeniy Tyurin gave a talk about this
at the last year LLVM'Dev[1][2], and a short summary is: OpenCL barriers on CPU
are complicated, and they are *very* expensive for performance and compile time.

[1]: 2018 LLVM Developers’ Meeting: E. Tyurin “Implementing an OpenCL compiler for CPU in LLVM” - YouTube
[2]: https://llvm.org/devmtg/2018-10/slides/Tyurin-ImplementingOpenCLCompiler.pdf

If so, would it be a long-term viable alternative to inform the
various code size heuristics about this instead of using
`noduplicate`?

I think so. If we can tell standard LLVM optimizations to not make several
calls out of one call, that should be good enough. Although it is exactly the
meaning of the current `noduplicate' attribute, so I'm not sure what will be
the difference.

Another related problem is the fact that the OpenCL barrier is not an LLVM
intrinsic - it is a regular function (declaration) that has the attribute.
If we want to inform standard LLVM optimizations about it, this function
should be changed to an intrinsic, right?

These are good points. I think the first question should be: Do we know
of any active users of this attribute right now? If not, deprecation
seems like something we could do, e.g., through a warning in clang and
in the middle-end to ensure other front-ends are aware of it as well.

Noduplicate attribute is still used by the Intel OpenCL Compiler for CPU.
The main use case is to prevent loop unroll when an OpenCL barrier is called
within a loop. Although such loop can be unrolled and keep its semantic intact, but
this introduces a lot of distinct barrier calls, and each of them has to
be handled separately.

In other words, "noduplicate" serves as a hint to not unroll a loop if a
certain function is called in a loop body.

I don't quite understand the reasoning behind this. Is it because your
backend turns each individual barrier call into a large chunk of code?

It is even worse than just a large chunk of code: in order to support OpenCL
barrier on CPU (at least in our implementation) we have to significantly change
control flow across the entire call chain. Evgeniy Tyurin gave a talk about this
at the last year LLVM'Dev[1][2], and a short summary is: OpenCL barriers on CPU
are complicated, and they are *very* expensive for performance and compile time.

[1]: 2018 LLVM Developers’ Meeting: E. Tyurin “Implementing an OpenCL compiler for CPU in LLVM” - YouTube
[2]: https://llvm.org/devmtg/2018-10/slides/Tyurin-ImplementingOpenCLCompiler.pdf

If so, would it be a long-term viable alternative to inform the
various code size heuristics about this instead of using
`noduplicate`?

I think so. If we can tell standard LLVM optimizations to not make several
calls out of one call, that should be good enough. Although it is exactly the
meaning of the current `noduplicate' attribute, so I'm not sure what will be
the difference.

Another related problem is the fact that the OpenCL barrier is not an LLVM
intrinsic - it is a regular function (declaration) that has the attribute.
If we want to inform standard LLVM optimizations about it, this function
should be changed to an intrinsic, right?

That's an option. It's also possible to teach TargetLibraryInfo about
it. The optimizer knows about malloc(), but that's not an intrinsic.

-Hal