[RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

Hi All,

I’d like to float two changes to the llvm.memcpy / llvm.memmove intrinsics.

(1) Add an i1 argument to the llvm.memcpy intrinsic.

When set to ‘1’ (the auto-upgrade default), this argument would indicate that the source and destination arguments may perfectly alias (otherwise they must not alias at all - memcpy prohibits partial overlap). While the C standard says that memcpy’s arguments can’t alias at all, perfect aliasing works in practice, and clang currently relies on this behavior: it emits llvm.memcpys for aggregate copies, despite the possibility of self-assignment.

Going forward, llvm.memcpy calls emitted for aggregate copies would have mayPerfectlyAlias set to ‘1’. Other uses of llvm.memcpy (including lowerings from memcpy calls) would have mapPerfectlyAlias set to ‘0’.

This change is motivated by poor optimization for small memcpys on targets with strict alignment requirements. When a user writes a small, unaligned memcpy we may transform it into an unaligned load/store pair in instcombine (See InstCombine::SimplifyMemTransfer), which is then broken up into an unwieldy series of smaller loads and stores during legalization. I have a fix for this issue which tags the pointers for unaligned load/store pairs with noalias metadata allowing CodeGen to produce better code during legalization, but it’s not safe to apply while clang is emitting memcpys with pointers that may perfectly alias. If the ‘mayPerfectlyAlias’ flag were introduced, I could inspect that and add the noalias tag only if mayPerfectlyAlias is ‘0’.

Note: We could also achieve the desired effect by adding a new intrinsic (llvm.structcpy?) with semantics that match the current llvm.memcpy ones (i.e. perfect-aliasing or non-aliasing, but no partial), and then reclaim llvm.memcpy for non-aliasing pointers only. I floated this idea with David Majnemer on IRC and he suggested that adding a flag to llvm.memcpy might be less disruptive and easier to maintain - thanks for the suggestion David!

(2) Allow different source and destination alignments on both llvm.memcpy / llvm.memmove.

Since I’m talking about changes to llvm.memcpy anyway, a few people asked me to float this one. Having separate alignments for the source and destination pointers may allow us to generate better code when one of the pointers has a higher alignment.

The auto-upgrade for this would be to set both source and destination alignment to the original ‘align’ value.

Any thoughts?

Cheers,
Lang.

Hey Lang

Hi All,

I'd like to float two changes to the llvm.memcpy / llvm.memmove intrinsics.

(1) Add an i1 <mayPerfectlyAlias> argument to the llvm.memcpy intrinsic.

When set to '1' (the auto-upgrade default), this argument would indicate that the source and destination arguments may perfectly alias (otherwise they must not alias at all - memcpy prohibits partial overlap). While the C standard says that memcpy's arguments can't alias at all, perfect aliasing works in practice, and clang currently relies on this behavior: it emits llvm.memcpys for aggregate copies, despite the possibility of self-assignment.

Going forward, llvm.memcpy calls emitted for aggregate copies would have mayPerfectlyAlias set to '1'. Other uses of llvm.memcpy (including lowerings from memcpy calls) would have mapPerfectlyAlias set to '0'.

This change is motivated by poor optimization for small memcpys on targets with strict alignment requirements. When a user writes a small, unaligned memcpy we may transform it into an unaligned load/store pair in instcombine (See InstCombine::SimplifyMemTransfer), which is then broken up into an unwieldy series of smaller loads and stores during legalization. I have a fix for this issue which tags the pointers for unaligned load/store pairs with noalias metadata allowing CodeGen to produce better code during legalization, but it's not safe to apply while clang is emitting memcpys with pointers that may perfectly alias. If the 'mayPerfectlyAlias' flag were introduced, I could inspect that and add the noalias tag only if mayPerfectlyAlias is '0'.

Note: We could also achieve the desired effect by adding a new intrinsic (llvm.structcpy?) with semantics that match the current llvm.memcpy ones (i.e. perfect-aliasing or non-aliasing, but no partial), and then reclaim llvm.memcpy for non-aliasing pointers only. I floated this idea with David Majnemer on IRC and he suggested that adding a flag to llvm.memcpy might be less disruptive and easier to maintain - thanks for the suggestion David!

(2) Allow different source and destination alignments on both llvm.memcpy / llvm.memmove.

Since I'm talking about changes to llvm.memcpy anyway, a few people asked me to float this one. Having separate alignments for the source and destination pointers may allow us to generate better code when one of the pointers has a higher alignment.

The auto-upgrade for this would be to set both source and destination alignment to the original 'align' value.

FWIW, I have a patch for this lying around. I can dig it up. I use alignment attributes to do it as there’s no need for alignment to be its own argument any more.

Cheers,
Pete

Hey Lang

Hi All,

I'd like to float two changes to the llvm.memcpy / llvm.memmove intrinsics.

(1) Add an i1 <mayPerfectlyAlias> argument to the llvm.memcpy intrinsic.

When set to '1' (the auto-upgrade default), this argument would indicate that the source and destination arguments may perfectly alias (otherwise they must not alias at all - memcpy prohibits partial overlap). While the C standard says that memcpy's arguments can't alias at all, perfect aliasing works in practice, and clang currently relies on this behavior: it emits llvm.memcpys for aggregate copies, despite the possibility of self-assignment.

Going forward, llvm.memcpy calls emitted for aggregate copies would have mayPerfectlyAlias set to '1'. Other uses of llvm.memcpy (including lowerings from memcpy calls) would have mapPerfectlyAlias set to '0'.

This change is motivated by poor optimization for small memcpys on targets with strict alignment requirements. When a user writes a small, unaligned memcpy we may transform it into an unaligned load/store pair in instcombine (See InstCombine::SimplifyMemTransfer), which is then broken up into an unwieldy series of smaller loads and stores during legalization. I have a fix for this issue which tags the pointers for unaligned load/store pairs with noalias metadata allowing CodeGen to produce better code during legalization, but it's not safe to apply while clang is emitting memcpys with pointers that may perfectly alias. If the 'mayPerfectlyAlias' flag were introduced, I could inspect that and add the noalias tag only if mayPerfectlyAlias is '0'.

Note: We could also achieve the desired effect by adding a new intrinsic (llvm.structcpy?) with semantics that match the current llvm.memcpy ones (i.e. perfect-aliasing or non-aliasing, but no partial), and then reclaim llvm.memcpy for non-aliasing pointers only. I floated this idea with David Majnemer on IRC and he suggested that adding a flag to llvm.memcpy might be less disruptive and easier to maintain - thanks for the suggestion David!

Given there's a semantically conservative interpretation and a more optimistic one, this really sounds like a case for metadata not another argument to the function. Our memcpy could keep it's current semantics, and we could add a piece of metadata which says none of the arguments to the call alias.

Actually, can't we already get this interpretation by marking both argument points as noalias? Doesn't that require that they don't overlap at all? I think we just need the ability to specify noalias at the callsite for each argument. I don't know if that's been tried, but it should work in theory. There are some issues with control dependence of call site attributes though that we'd need to watch out for/fix.

(2) Allow different source and destination alignments on both llvm.memcpy / llvm.memmove.

Since I'm talking about changes to llvm.memcpy anyway, a few people asked me to float this one. Having separate alignments for the source and destination pointers may allow us to generate better code when one of the pointers has a higher alignment.

The auto-upgrade for this would be to set both source and destination alignment to the original 'align' value.

FWIW, I have a patch for this lying around. I can dig it up. I use alignment attributes to do it as there’s no need for alignment to be its own argument any more.

This would be a nice cleanup in general. +1

From: "Lang Hames" <lhames@gmail.com>
To: "LLVM Developers Mailing List" <llvm-dev@lists.llvm.org>
Cc: "Chandler Carruth" <chandlerc@gmail.com>, "Hal Finkel" <hfinkel@anl.gov>, "David Majnemer"
<david.majnemer@gmail.com>, "John McCall" <rjmccall@apple.com>, "Jim Grosbach" <grosbach@apple.com>
Sent: Tuesday, August 18, 2015 8:04:48 PM
Subject: [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

Hi All,

I'd like to float two changes to the llvm.memcpy / llvm.memmove
intrinsics.

(1) Add an i1 <mayPerfectlyAlias> argument to the llvm.memcpy
intrinsic.

When set to '1' (the auto-upgrade default), this argument would
indicate that the source and destination arguments may perfectly
alias (otherwise they must not alias at all - memcpy prohibits
partial overlap). While the C standard says that memcpy's arguments
can't alias at all, perfect aliasing works in practice, and clang
currently relies on this behavior: it emits llvm.memcpys for
aggregate copies, despite the possibility of self-assignment.

Going forward, llvm.memcpy calls emitted for aggregate copies would
have mayPerfectlyAlias set to '1'. Other uses of llvm.memcpy
(including lowerings from memcpy calls) would have mapPerfectlyAlias
set to '0'.

This change is motivated by poor optimization for small memcpys on
targets with strict alignment requirements. When a user writes a
small, unaligned memcpy we may transform it into an unaligned
load/store pair in instcombine (See
InstCombine::SimplifyMemTransfer), which is then broken up into an
unwieldy series of smaller loads and stores during legalization. I
have a fix for this issue which tags the pointers for unaligned
load/store pairs with noalias metadata allowing CodeGen to produce
better code during legalization, but it's not safe to apply while
clang is emitting memcpys with pointers that may perfectly alias. If
the 'mayPerfectlyAlias' flag were introduced, I could inspect that
and add the noalias tag only if mayPerfectlyAlias is '0'.

Note: We could also achieve the desired effect by adding a new
intrinsic (llvm.structcpy?) with semantics that match the current
llvm.memcpy ones (i.e. perfect-aliasing or non-aliasing, but no
partial), and then reclaim llvm.memcpy for non-aliasing pointers
only. I floated this idea with David Majnemer on IRC and he
suggested that adding a flag to llvm.memcpy might be less disruptive
and easier to maintain - thanks for the suggestion David!

(2) Allow different source and destination alignments on both
llvm.memcpy / llvm.memmove.

Since I'm talking about changes to llvm.memcpy anyway, a few people
asked me to float this one. Having separate alignments for the
source and destination pointers may allow us to generate better code
when one of the pointers has a higher alignment.

The auto-upgrade for this would be to set both source and destination
alignment to the original 'align' value.

As one of the people who asked for this, let me add: We currently have code which upgrades the alignment on memcpy intrinsics (because of alignment attributes, assumptions, etc.), and this is useful for making memcpy expand into vector instructions when the source/destination are suitably aligned. It would be useful for this to happen on some targets even if only the source or destination could be upgraded (aligned stores but underaligned loads might still be a win, for example). Currently we can't do this because we can only represent a single alignment. Because we aggressively form memcpy as part of idiom recognition, and emit them in frontends, this comes up more than it would from source-level memcpy calls alone.

Thus, I agree with John (and Lang), so long as we're fooling with the memcpy intrinsic's signature, we should do this too.

Any thoughts?

I'm strongly in favor of both pieces.

-Hal

From: "Philip Reames via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Pete Cooper" <peter_cooper@apple.com>, "Lang Hames" <lhames@gmail.com>
Cc: "LLVM Developers Mailing List" <llvm-dev@lists.llvm.org>
Sent: Wednesday, August 19, 2015 12:14:19 PM
Subject: Re: [llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

> Hey Lang
>>
>> Hi All,
>>
>> I'd like to float two changes to the llvm.memcpy / llvm.memmove
>> intrinsics.
>>
>>
>> (1) Add an i1 <mayPerfectlyAlias> argument to the llvm.memcpy
>> intrinsic.
>>
>> When set to '1' (the auto-upgrade default), this argument would
>> indicate that the source and destination arguments may perfectly
>> alias (otherwise they must not alias at all - memcpy prohibits
>> partial overlap). While the C standard says that memcpy's
>> arguments can't alias at all, perfect aliasing works in practice,
>> and clang currently relies on this behavior: it emits
>> llvm.memcpys for aggregate copies, despite the possibility of
>> self-assignment.
>>
>> Going forward, llvm.memcpy calls emitted for aggregate copies
>> would have mayPerfectlyAlias set to '1'. Other uses of
>> llvm.memcpy (including lowerings from memcpy calls) would have
>> mapPerfectlyAlias set to '0'.
>>
>> This change is motivated by poor optimization for small memcpys on
>> targets with strict alignment requirements. When a user writes a
>> small, unaligned memcpy we may transform it into an unaligned
>> load/store pair in instcombine (See
>> InstCombine::SimplifyMemTransfer), which is then broken up into
>> an unwieldy series of smaller loads and stores during
>> legalization. I have a fix for this issue which tags the pointers
>> for unaligned load/store pairs with noalias metadata allowing
>> CodeGen to produce better code during legalization, but it's not
>> safe to apply while clang is emitting memcpys with pointers that
>> may perfectly alias. If the 'mayPerfectlyAlias' flag were
>> introduced, I could inspect that and add the noalias tag only if
>> mayPerfectlyAlias is '0'.
>>
>> Note: We could also achieve the desired effect by adding a new
>> intrinsic (llvm.structcpy?) with semantics that match the current
>> llvm.memcpy ones (i.e. perfect-aliasing or non-aliasing, but no
>> partial), and then reclaim llvm.memcpy for non-aliasing pointers
>> only. I floated this idea with David Majnemer on IRC and he
>> suggested that adding a flag to llvm.memcpy might be less
>> disruptive and easier to maintain - thanks for the suggestion
>> David!
Given there's a semantically conservative interpretation and a more
optimistic one, this really sounds like a case for metadata not
another
argument to the function. Our memcpy could keep it's current
semantics,
and we could add a piece of metadata which says none of the arguments
to
the call alias.

We could add some "memcpy-allows-self-copies" metadata, and have Clang tag its associated aggregate copies with it. That would also work.

Actually, can't we already get this interpretation by marking both
argument points as noalias? Doesn't that require that they don't
overlap at all? I think we just need the ability to specify noalias
at
the callsite for each argument. I don't know if that's been tried,
but
it should work in theory. There are some issues with control
dependence
of call site attributes though that we'd need to watch out for/fix.

But that's not quite what we want. We want to say: These can't alias, unless they're exactly equal. noalias either means that it does not alias at all, nor do any derived pointers, and obviously the lack of it says nothing.

This we can still make aliasing assumptions if can prove that src != destination, which is often easier than proving things accounting for overlaps.

>>
>>
>>
>> (2) Allow different source and destination alignments on both
>> llvm.memcpy / llvm.memmove.
>>
>> Since I'm talking about changes to llvm.memcpy anyway, a few
>> people asked me to float this one. Having separate alignments for
>> the source and destination pointers may allow us to generate
>> better code when one of the pointers has a higher alignment.
>>
>> The auto-upgrade for this would be to set both source and
>> destination alignment to the original 'align' value.
> FWIW, I have a patch for this lying around. I can dig it up. I
> use alignment attributes to do it as there’s no need for alignment
> to be its own argument any more.
This would be a nice cleanup in general. +1

I agree, this sounds useful.

-Hal

From: "Philip Reames via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Pete Cooper" <peter_cooper@apple.com>, "Lang Hames" <lhames@gmail.com>
Cc: "LLVM Developers Mailing List" <llvm-dev@lists.llvm.org>
Sent: Wednesday, August 19, 2015 12:14:19 PM
Subject: Re: [llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

Hey Lang

Hi All,

I'd like to float two changes to the llvm.memcpy / llvm.memmove
intrinsics.

(1) Add an i1 <mayPerfectlyAlias> argument to the llvm.memcpy
intrinsic.

When set to '1' (the auto-upgrade default), this argument would
indicate that the source and destination arguments may perfectly
alias (otherwise they must not alias at all - memcpy prohibits
partial overlap). While the C standard says that memcpy's
arguments can't alias at all, perfect aliasing works in practice,
and clang currently relies on this behavior: it emits
llvm.memcpys for aggregate copies, despite the possibility of
self-assignment.

Going forward, llvm.memcpy calls emitted for aggregate copies
would have mayPerfectlyAlias set to '1'. Other uses of
llvm.memcpy (including lowerings from memcpy calls) would have
mapPerfectlyAlias set to '0'.

This change is motivated by poor optimization for small memcpys on
targets with strict alignment requirements. When a user writes a
small, unaligned memcpy we may transform it into an unaligned
load/store pair in instcombine (See
InstCombine::SimplifyMemTransfer), which is then broken up into
an unwieldy series of smaller loads and stores during
legalization. I have a fix for this issue which tags the pointers
for unaligned load/store pairs with noalias metadata allowing
CodeGen to produce better code during legalization, but it's not
safe to apply while clang is emitting memcpys with pointers that
may perfectly alias. If the 'mayPerfectlyAlias' flag were
introduced, I could inspect that and add the noalias tag only if
mayPerfectlyAlias is '0'.

Note: We could also achieve the desired effect by adding a new
intrinsic (llvm.structcpy?) with semantics that match the current
llvm.memcpy ones (i.e. perfect-aliasing or non-aliasing, but no
partial), and then reclaim llvm.memcpy for non-aliasing pointers
only. I floated this idea with David Majnemer on IRC and he
suggested that adding a flag to llvm.memcpy might be less
disruptive and easier to maintain - thanks for the suggestion
David!

Given there's a semantically conservative interpretation and a more
optimistic one, this really sounds like a case for metadata not
another
argument to the function. Our memcpy could keep it's current
semantics,
and we could add a piece of metadata which says none of the arguments
to
the call alias.

We could add some "memcpy-allows-self-copies" metadata, and have Clang tag its associated aggregate copies with it. That would also work.

Isn’t introducing an instruction wise “correctness” related metadata?
Shouldn’t it be the opposite for correctness, i.e. “memcpy-disallows-self-copies”?
(correctness in the sense that dropping the metadata does not break anything).

Actually, can't we already get this interpretation by marking both
argument points as noalias? Doesn't that require that they don't
overlap at all? I think we just need the ability to specify noalias
at
the callsite for each argument. I don't know if that's been tried,
but
it should work in theory. There are some issues with control
dependence
of call site attributes though that we'd need to watch out for/fix.

But that's not quite what we want. We want to say: These can't alias, unless they're exactly equal. noalias either means that it does not alias at all, nor do any derived pointers, and obviously the lack of it says nothing.

This we can still make aliasing assumptions if can prove that src != destination, which is often easier than proving things accounting for overlaps.

Is this limited to the memcpy case or are these other use-cases so that it would be worth having another attribute than noalias that would carry this semantic (“nooverlap”)?

From: "Mehdi Amini" <mehdi.amini@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Philip Reames" <listmail@philipreames.com>, "LLVM Developers Mailing List" <llvm-dev@lists.llvm.org>
Sent: Wednesday, August 19, 2015 2:54:56 PM
Subject: Re: [llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

>
>> From: "Philip Reames via llvm-dev" <llvm-dev@lists.llvm.org>
>> To: "Pete Cooper" <peter_cooper@apple.com>, "Lang Hames"
>> <lhames@gmail.com>
>> Cc: "LLVM Developers Mailing List" <llvm-dev@lists.llvm.org>
>> Sent: Wednesday, August 19, 2015 12:14:19 PM
>> Subject: Re: [llvm-dev] [RFC] Generalize llvm.memcpy /
>> llvm.memmove intrinsics.
>>
>>> Hey Lang
>>>>
>>>> Hi All,
>>>>
>>>> I'd like to float two changes to the llvm.memcpy / llvm.memmove
>>>> intrinsics.
>>>>
>>>>
>>>> (1) Add an i1 <mayPerfectlyAlias> argument to the llvm.memcpy
>>>> intrinsic.
>>>>
>>>> When set to '1' (the auto-upgrade default), this argument would
>>>> indicate that the source and destination arguments may perfectly
>>>> alias (otherwise they must not alias at all - memcpy prohibits
>>>> partial overlap). While the C standard says that memcpy's
>>>> arguments can't alias at all, perfect aliasing works in
>>>> practice,
>>>> and clang currently relies on this behavior: it emits
>>>> llvm.memcpys for aggregate copies, despite the possibility of
>>>> self-assignment.
>>>>
>>>> Going forward, llvm.memcpy calls emitted for aggregate copies
>>>> would have mayPerfectlyAlias set to '1'. Other uses of
>>>> llvm.memcpy (including lowerings from memcpy calls) would have
>>>> mapPerfectlyAlias set to '0'.
>>>>
>>>> This change is motivated by poor optimization for small memcpys
>>>> on
>>>> targets with strict alignment requirements. When a user writes a
>>>> small, unaligned memcpy we may transform it into an unaligned
>>>> load/store pair in instcombine (See
>>>> InstCombine::SimplifyMemTransfer), which is then broken up into
>>>> an unwieldy series of smaller loads and stores during
>>>> legalization. I have a fix for this issue which tags the
>>>> pointers
>>>> for unaligned load/store pairs with noalias metadata allowing
>>>> CodeGen to produce better code during legalization, but it's not
>>>> safe to apply while clang is emitting memcpys with pointers that
>>>> may perfectly alias. If the 'mayPerfectlyAlias' flag were
>>>> introduced, I could inspect that and add the noalias tag only if
>>>> mayPerfectlyAlias is '0'.
>>>>
>>>> Note: We could also achieve the desired effect by adding a new
>>>> intrinsic (llvm.structcpy?) with semantics that match the
>>>> current
>>>> llvm.memcpy ones (i.e. perfect-aliasing or non-aliasing, but no
>>>> partial), and then reclaim llvm.memcpy for non-aliasing pointers
>>>> only. I floated this idea with David Majnemer on IRC and he
>>>> suggested that adding a flag to llvm.memcpy might be less
>>>> disruptive and easier to maintain - thanks for the suggestion
>>>> David!
>> Given there's a semantically conservative interpretation and a
>> more
>> optimistic one, this really sounds like a case for metadata not
>> another
>> argument to the function. Our memcpy could keep it's current
>> semantics,
>> and we could add a piece of metadata which says none of the
>> arguments
>> to
>> the call alias.
>
> We could add some "memcpy-allows-self-copies" metadata, and have
> Clang tag its associated aggregate copies with it. That would also
> work.

Isn’t introducing an instruction wise “correctness” related metadata?
Shouldn’t it be the opposite for correctness, i.e.
“memcpy-disallows-self-copies”?
(correctness in the sense that dropping the metadata does not break
anything).

Indeed, you're correct.

-Hal

From: “Philip Reames via llvm-dev” <llvm-dev@lists.llvm.org>
To: “Pete Cooper” <peter_cooper@apple.com>, “Lang Hames” <lhames@gmail.com>
Cc: “LLVM Developers Mailing List” <llvm-dev@lists.llvm.org>
Sent: Wednesday, August 19, 2015 12:14:19 PM
Subject: Re: [llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

Hey Lang

Hi All,

I’d like to float two changes to the llvm.memcpy / llvm.memmove
intrinsics.

(1) Add an i1 argument to the llvm.memcpy
intrinsic.

When set to ‘1’ (the auto-upgrade default), this argument would
indicate that the source and destination arguments may perfectly
alias (otherwise they must not alias at all - memcpy prohibits
partial overlap). While the C standard says that memcpy’s
arguments can’t alias at all, perfect aliasing works in practice,
and clang currently relies on this behavior: it emits
llvm.memcpys for aggregate copies, despite the possibility of
self-assignment.

Going forward, llvm.memcpy calls emitted for aggregate copies
would have mayPerfectlyAlias set to ‘1’. Other uses of
llvm.memcpy (including lowerings from memcpy calls) would have
mapPerfectlyAlias set to ‘0’.

This change is motivated by poor optimization for small memcpys on
targets with strict alignment requirements. When a user writes a
small, unaligned memcpy we may transform it into an unaligned
load/store pair in instcombine (See
InstCombine::SimplifyMemTransfer), which is then broken up into
an unwieldy series of smaller loads and stores during
legalization. I have a fix for this issue which tags the pointers
for unaligned load/store pairs with noalias metadata allowing
CodeGen to produce better code during legalization, but it’s not
safe to apply while clang is emitting memcpys with pointers that
may perfectly alias. If the ‘mayPerfectlyAlias’ flag were
introduced, I could inspect that and add the noalias tag only if
mayPerfectlyAlias is ‘0’.

Note: We could also achieve the desired effect by adding a new
intrinsic (llvm.structcpy?) with semantics that match the current
llvm.memcpy ones (i.e. perfect-aliasing or non-aliasing, but no
partial), and then reclaim llvm.memcpy for non-aliasing pointers
only. I floated this idea with David Majnemer on IRC and he
suggested that adding a flag to llvm.memcpy might be less
disruptive and easier to maintain - thanks for the suggestion
David!

Given there’s a semantically conservative interpretation and a more
optimistic one, this really sounds like a case for metadata not
another
argument to the function. Our memcpy could keep it’s current
semantics,
and we could add a piece of metadata which says none of the arguments
to
the call alias.

We could add some “memcpy-allows-self-copies” metadata, and have Clang tag its associated aggregate copies with it. That would also work.

Isn’t introducing an instruction wise “correctness” related metadata?
Shouldn’t it be the opposite for correctness, i.e. “memcpy-disallows-self-copies”?
(correctness in the sense that dropping the metadata does not break anything).

Actually, can’t we already get this interpretation by marking both
argument points as noalias? Doesn’t that require that they don’t
overlap at all? I think we just need the ability to specify noalias
at
the callsite for each argument. I don’t know if that’s been tried,
but
it should work in theory. There are some issues with control
dependence
of call site attributes though that we’d need to watch out for/fix.

But that’s not quite what we want. We want to say: These can’t alias, unless they’re exactly equal. noalias either means that it does not alias at all, nor do any derived pointers, and obviously the lack of it says nothing.

This we can still make aliasing assumptions if can prove that src != destination, which is often easier than proving things accounting for overlaps.

Is this limited to the memcpy case or are these other use-cases so that it would be worth having another attribute than noalias that would carry this semantic (“nooverlap”)?

I was wondering about that, too. This looks like information either the user has or the compiler could derive. Would it be best to condense the properties into alias and align attributes that are also user visible?

Pete - That patch sounds great!

Philip, Hal, Medhi, Gerolf - Thanks very much for the feedback.

So how about this:
(1) We drop llvm.memcpy’s alignment argument and use Pete’s alignment-via-metadata patch (whatever version of it passes review).
(2) llvm.memcpy retains its current semantics, but we teach clang, SimplifyLibCalls, etc. to add noalias metadata where we know it’s safe.

Dropping the alignment argument will still change the signature of llvm.memcpy / llvm.memmove, so I guess there’s one other issue worth discussing: Should we also split ‘isVolatile’ into ‘isSrcVolatile’ and ‘isDstVolatile’ ? Nobody has asked for this as far as I know, but I believe it would improve codegen in some cases. E.g.:

typedef struct {
unsigned X[8];
} S;

unsigned foo(volatile S* s) {
S t = *s;
return t.X[4];
}

If the frontend lowers the struct copy to a volatile memcpy we’ll have to copy the whole struct before reading part of ‘t’. If we could mark only the source as volatile then we could discard the stores to ‘t’.

Again - nobody has asked for this, but if there’s interest now would be a good time to look at it.

Cheers,
Lang.

Pete - That patch sounds great!

Philip, Hal, Medhi, Gerolf - Thanks very much for the feedback.

So how about this:
(1) We drop llvm.memcpy’s alignment argument and use Pete’s alignment-via-metadata patch (whatever version of it passes review).

alignment-via-attributes, but yeah, hopefully i can get something everyone is happy with.

(2) llvm.memcpy retains its current semantics, but we teach clang, SimplifyLibCalls, etc. to add noalias metadata where we know it’s safe.

Dropping the alignment argument will still change the signature of llvm.memcpy / llvm.memmove, so I guess there’s one other issue worth discussing: Should we also split ‘isVolatile’ into ‘isSrcVolatile’ and ‘isDstVolatile’ ? Nobody has asked for this as far as I know, but I believe it would improve codegen in some cases. E.g.:

typedef struct {
unsigned X[8];
} S;

unsigned foo(volatile S* s) {
S t = *s;
return t.X[4];
}

If the frontend lowers the struct copy to a volatile memcpy we’ll have to copy the whole struct before reading part of ‘t’. If we could mark only the source as volatile then we could discard the stores to ‘t’.

Again - nobody has asked for this, but if there’s interest now would be a good time to look at it.

I like the idea. As an alternative implementation, how about adding flags arguments to intrinsics?

What I’m imagining is (in Intrinsics.td) giving a list of allowed flags for an argument of an intrinsic, then letting the tablegen emitter handle outputting them in a way that the C++ compiler code and LLParser/reader/writer can all consume the flags from tablegen. It seems better and more readable to have a bunch of flags than a bunch of i1’s you need to refer to the docs for the order of. Just my 2c though, i’m happy to see the volatile change in whatever form suits,

Cheers
Pete

From: "Lang Hames" <lhames@gmail.com>
To: "Gerolf Hoflehner" <ghoflehner@apple.com>
Cc: "Mehdi Amini" <mehdi.amini@apple.com>, "LLVM Developers Mailing List" <llvm-dev@lists.llvm.org>, "Hal Finkel"
<hfinkel@anl.gov>, "Philip Reames" <listmail@philipreames.com>, "Peter Cooper" <peter_cooper@apple.com>
Sent: Thursday, August 20, 2015 4:26:20 PM
Subject: Re: [llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

Pete - That patch sounds great!

Philip, Hal, Medhi, Gerolf - Thanks very much for the feedback.

So how about this:
(1) We drop llvm.memcpy's alignment argument and use Pete's
alignment-via-metadata patch (whatever version of it passes review).
(2) llvm.memcpy retains its current semantics, but we teach clang,
SimplifyLibCalls, etc. to add noalias metadata where we know it's
safe.

By this I assume you mean some new 'nooverlap' metadata? I don't think we have any existing metadata with the correct semantics.

Dropping the alignment argument will still change the signature of
llvm.memcpy / llvm.memmove, so I guess there's one other issue worth
discussing: Should we also split 'isVolatile' into 'isSrcVolatile'
and 'isDstVolatile' ?

Yes. We should be able to specify all relevant properties of the source and destination separately. I see no reason not to do this.

-Hal

Hi Hal

By this I assume you mean some new ‘nooverlap’ metadata? I don’t think we have any existing metadata with the correct semantics.

I was thinking we could just use the existing noalias metadata. Implicitly, the current llvm.memcpy semantics are “src and dst overlap perfectly or not at all” (perhaps we should update the docs to reflect this if we plan to rely on it?). Attaching noalias metadata to the source and destination would capture the extra information that the pointers really do not overlap, when we can figure that out (e.g. when lowering a libc memcpy call).

It does seem odd that we would rely on the documented behaviour of libc memcpy (dst/src should not alias at all) to attach the noalias metadata, while simultaneously relying on the undocumented behaviour of libc memcpy (perfect aliasing works in practice) to lower to llvm.memcpy for struct copies. The clang struct copy code should probably carry a warning: Do what we say, not what we do.

Cheers,
Lang.

From: "Lang Hames" <lhames@gmail.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Mehdi Amini" <mehdi.amini@apple.com>, "LLVM Developers Mailing List" <llvm-dev@lists.llvm.org>, "Philip Reames"
<listmail@philipreames.com>, "Peter Cooper" <peter_cooper@apple.com>, "Gerolf Hoflehner" <ghoflehner@apple.com>
Sent: Friday, August 21, 2015 1:02:18 AM
Subject: Re: [llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

Hi Hal

> By this I assume you mean some new 'nooverlap' metadata? I don't
> think we have any existing metadata with the correct semantics.

I was thinking we could just use the existing noalias metadata.
Implicitly, the current llvm.memcpy semantics are "src and dst
overlap perfectly or not at all" (perhaps we should update the docs
to reflect this if we plan to rely on it?

If we're going to do that, we certainly should.

). Attaching noalias
metadata to the source and destination would capture the extra
information that the pointers really do not overlap, when we can
figure that out (e.g. when lowering a libc memcpy call).

If you attach noalias metadata to the memcpy call, it will apply to both the source and destination; we don't have a way to differentiate. It might be true that if you attach both noalias and alias.scope metadata to the call, then querying the call against itself will return NoModRef, but that's really hacky (and, in part, wrong, because the destination still alias with itself).

It does seem odd that we would rely on the documented behaviour of
libc memcpy (dst/src should not alias at all) to attach the noalias
metadata, while simultaneously relying on the undocumented behaviour
of libc memcpy (perfect aliasing works in practice) to lower to
llvm.memcpy for struct copies. The clang struct copy code should
probably carry a warning: Do what we say, not what we do.

I agree. Chatting with Chandler offline, he suggested that it might be better to have Clang emit a pointer-quality check and branch around the memcpy when the pointers are equal. This might be faster than the self-copies anyway, and we might often be able to statically prove the result of the comparison. I think this is worth experimenting with.

Thanks again,
Hal

From: "Lang Hames" <lhames@gmail.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Mehdi Amini" <mehdi.amini@apple.com>, "LLVM Developers Mailing List" <llvm-dev@lists.llvm.org>, "Philip Reames"
<listmail@philipreames.com>, "Peter Cooper" <peter_cooper@apple.com>, "Gerolf Hoflehner" <ghoflehner@apple.com>
Sent: Friday, August 21, 2015 1:02:18 AM
Subject: Re: [llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.

Hi Hal

By this I assume you mean some new 'nooverlap' metadata? I don't
think we have any existing metadata with the correct semantics.

I was thinking we could just use the existing noalias metadata.
Implicitly, the current llvm.memcpy semantics are "src and dst
overlap perfectly or not at all" (perhaps we should update the docs
to reflect this if we plan to rely on it?

If we're going to do that, we certainly should.

). Attaching noalias
metadata to the source and destination would capture the extra
information that the pointers really do not overlap, when we can
figure that out (e.g. when lowering a libc memcpy call).

If you attach noalias metadata to the memcpy call, it will apply to both the source and destination; we don't have a way to differentiate. It might be true that if you attach both noalias and alias.scope metadata to the call, then querying the call against itself will return NoModRef, but that's really hacky (and, in part, wrong, because the destination still alias with itself).

It does seem odd that we would rely on the documented behaviour of
libc memcpy (dst/src should not alias at all) to attach the noalias
metadata, while simultaneously relying on the undocumented behaviour
of libc memcpy (perfect aliasing works in practice) to lower to
llvm.memcpy for struct copies. The clang struct copy code should
probably carry a warning: Do what we say, not what we do.

I agree. Chatting with Chandler offline, he suggested that it might be better to have Clang emit a pointer-quality check and branch around the memcpy when the pointers are equal. This might be faster than the self-copies anyway, and we might often be able to statically prove the result of the comparison. I think this is worth experimenting with.

+1. Also, if we know we're targeting a library which supports self-copies, we can late pattern match this as well.

Hi Hal,

If you attach noalias metadata to the memcpy call, it will apply to both the source and destination; we don’t have a way to differentiate. It might be true that if you attach both noalias and alias.scope metadata to the call, then querying the call against itself will return NoModRef, but that’s really hacky (and, in part, wrong, because the destination still alias with itself).

Sorry it took me a while to get back to this, and thanks for the explanation. I had misremembered how noalias metadata worked, and was imagining we could tag the pointers themselves as non-aliasing (along the lines of the noalias parameter attribute).

Hi Hal,

If you attach noalias metadata to the memcpy call, it will apply to both the source and destination; we don’t have a way to differentiate. It might be true that if you attach both noalias and alias.scope metadata to the call, then querying the call against itself will return NoModRef, but that’s really hacky (and, in part, wrong, because the destination still alias with itself).

Sorry it took me a while to get back to this, and thanks for the explanation. I had misremembered how noalias metadata worked, and was imagining we could tag the pointers themselves as non-aliasing (along the lines of the noalias parameter attribute).

FWIW, we could auto-upgrade old IR with the same test and branch that Clang would emit. Not 100% certain the best way to detect the old IR, but we do have options to ensure old bitcode continues to function.

Hi Hal,

If you attach noalias metadata to the memcpy call, it will apply to both the source and destination; we don’t have a way to differentiate. It might be true that if you attach both noalias and alias.scope metadata to the call, then querying the call against itself will return NoModRef, but that’s really hacky (and, in part, wrong, because the destination still alias with itself).

Sorry it took me a while to get back to this, and thanks for the explanation. I had misremembered how noalias metadata worked, and was imagining we could tag the pointers themselves as non-aliasing (along the lines of the noalias parameter attribute).

FWIW, we could auto-upgrade old IR with the same test and branch that Clang would emit. Not 100% certain the best way to detect the old IR, but we do have options to ensure old bitcode continues to function.

TBH, inserting a branch worries me in terms of how well the optimizer can handle it.

The reason is that now a memcpy will be seen as conditional, and so block any other optimization which can no longer assume the memcpy always happens. In particular, if we have (prior to control flow insertion)

memset(a, 0)
memcpy(b, a, sizeof(b))
b[0]

b[0] is provably 0 in the optimizer without any control flow. But once we insert control flow, we can’t reason about the memcpy at all.

At this point i’m tempted to go with a new intrinsic for memcpy which represents this. We could have MemCpyInst hide which one we are calling, and it can have a method to tell us whether we can assume non-overlapping memory or not. Just a thought though.

Cheers,
Pete