Alias analysis issue with structs on PPC

Hi,

I have the following C loop to vectorize:

struct box {
double* source;
};

void test(double* restrict result, struct box my_struct, int len)
{
for (int i=0 ; i<len; i++) {
result[i] = my_struct.source[i] * my_struct.source[i];
}
}

There are two references in the loop, result[i] (restrict) and my_struct.source[i] (readonly). The compiler should easily figure out that they do not alias.

Compiling for x86, the loop alias analysis works just fine:
AST: Alias Set Tracker: 2 alias sets for 2 pointer values.
AliasSet[0x7fd8e2f32290, 1] must alias, No access Pointers: (double* %arrayidx5, 18446744073709551615)
AliasSet[0x7fd8e2f322e0, 1] must alias, No access Pointers: (double* %arrayidx, 18446744073709551615)

Compiling for PPC with -target powerpc64le-ibm-linux-gnu, the two addresses now alias:
AST: Alias Set Tracker: 1 alias sets for 2 pointer values.
AliasSet[0x7f931bd5bdc0, 2] may alias, No access Pointers: (double* %arrayidx5, 18446744073709551615), (double* %arrayidx, 18446744073709551615)

BasicAA is used for both targets by default. The difference is that in PPC, the IR obtained from Clang takes an i64 as parameter instead of a double* for my_struct. This parameter is then coerced into double* using an inttoptr instruction. The code in BasicAliasAnalysis.cpp which is triggered for x86 is the following:

// Function arguments can’t alias with things that are known to be
// unambigously identified at the function level.
if ((isa(O1) && isIdentifiedFunctionLocal(O2)) ||
(isa(O2) && isIdentifiedFunctionLocal(O1)))
return NoAlias;

isIdentifiedFunctionLocal(V) returns true for a noalias argument (such as result), but the other address (my_struct) must be a function argument in order to return NoAlias, which is not the case anymore for PPC (since my_struct is now the result from an inttoptr instruction). If I understand, the problem is that we cannot trust the fact that locals do not alias with restrict parameters (because the compiler could generate some locals which alias)? If someone has suggestions about this, that would help a lot.

Thanks,
Olivier

Hi,

I have the following C loop to vectorize:

struct box {
double* source;
};

void test(double* restrict result, struct box my_struct, int len)
{
for (int i=0 ; i<len; i++) {
result[i] = my_struct.source[i] * my_struct.source[i];
}
}

There are two references in the loop, result[i] (restrict) and my_struct.source[i] (readonly). The compiler should easily figure out that they do not alias.

Compiling for x86, the loop alias analysis works just fine:
AST: Alias Set Tracker: 2 alias sets for 2 pointer values.
AliasSet[0x7fd8e2f32290, 1] must alias, No access Pointers: (double* %arrayidx5, 18446744073709551615)
AliasSet[0x7fd8e2f322e0, 1] must alias, No access Pointers: (double* %arrayidx, 18446744073709551615)

Compiling for PPC with -target powerpc64le-ibm-linux-gnu, the two addresses now alias:
AST: Alias Set Tracker: 1 alias sets for 2 pointer values.
AliasSet[0x7f931bd5bdc0, 2] may alias, No access Pointers: (double* %arrayidx5, 18446744073709551615), (double* %arrayidx, 18446744073709551615)

BasicAA is used for both targets by default. The difference is that in PPC, the IR obtained from Clang takes an i64 as parameter instead of a double* for my_struct.

I don’t even want to know why this would be the case :slight_smile:

This parameter is then coerced into double* using an inttoptr instruction. The code in BasicAliasAnalysis.cpp which is triggered for x86 is the following:

// Function arguments can’t alias with things that are known to be
// unambigously identified at the function level.
if ((isa(O1) && isIdentifiedFunctionLocal(O2)) ||
(isa(O2) && isIdentifiedFunctionLocal(O1)))
return NoAlias;

isIdentifiedFunctionLocal(V) returns true for a noalias argument (such as result), but the other address (my_struct) must be a function argument in order to return NoAlias, which is not the case anymore for PPC (since my_struct is now the result from an inttoptr instruction). If I understand, the problem is that we cannot trust the fact that locals do not alias with restrict parameters (because the compiler could generate some locals which alias)?

Yes, because pointers based on the noalias’d argument are legal aliases.

So if you don’t know it’s an argument or an identified local, it could be based on the restricted pointer, and thus, alias it.

If someone has suggestions about this, that would help a lot.

The only way you could prove something in this case would be to walk the chain and prove the value comes directly from an argument with no modification.

That is expensive to do in the general case, and inttoptr is generally not going to be wonderful for performance (for example, outside of BasicAA, more advanced AA’s like CFL-AA will give up on anything that comes from or goes through ptrtoint/inttoptr). So what’s the issue that makes you pass this as i64 in the first place?

Actually, you could do the opposite, too, pretty cheaply.

You could write a new pass or AA.
It traverses chains in the reverse direction (IE it goes from the arguments, and walks down the immediate use chain, marking things as based on arguments or not), and makes a lookup table of things it can prove are also unmolested identified objects.
(which would be the result of inttoptr in your case).

You can then use this simple lookup table to answer the isIdentifiedObject question better.
(You’d have to make isIdentifiedObject part of the AA interface, or take an optional table, blah blah blah)

Hi Daniel,

Thanks for your feedback. I would prefer not to write a new AA. Can’t we directly implement that traversal in BasicAA?

Otherwise, I’ll investigate why this i64 was generated in the first place (but like you, I don’t really want to know why :slight_smile:

Olivier

Hi Daniel,

Thanks for your feedback. I would prefer not to write a new AA. Can’t we directly implement that traversal in BasicAA?

Can I ask why?
Outside of the “well, it’s another pass”, i mean?

BasicAA is stateless, so you can’t cache, and you really don’t want to redo these walks repeatedly (especially when the answer doesn’t change unless AA is invalidated). It would be really expensive.
Doing this in a separate analysis pass, caching the answer, and producing AA results, seems to me exactly the right thing.

Otherwise, I’ll investigate why this i64 was generated in the first place (but like you, I don’t really want to know why :slight_smile:

Reid walked over to my desk and told me all the gory details - the clang ABI lowering of structs like these for anything but x86 basically says "oh, this really should be 2 8 byte GPR’s, and the way it makes this happen is by using i64’s :slight_smile:

If you want to do it at a clang level, the right thing to do is to fixup the ABI lowerings for pointers to keep them pointers in this case.

For something simpler, if you wanted, you could do the exact opposite of happens now and you’d get better results.

This is, pass everything that needs to be 2 8 byte regs as 8 byte pointers, and cast it back to something if it’s not really a pointer, using ptrtoint.

if it’s not really a pointer, we don’t care what AA says about it.
If it is a pointer, now we don’t get bad AA answers, because there is no inttoptr being used like there is now.

Of course, i have no idea if this strategy is going to produce correct ABI results on all platforms, it depends on whether they treat pointers as special.

Reason is that I want this case to be supported by default, without specific flags (should we change the default AA for PPC if we do a separate one?).

Fixing at the Clang level might be the right thing to do… But if there are other cases like this, the traversal would definitely help in general to get a proper handling of restrict.

Olivier

Reason is that I want this case to be supported by default, without specific flags (should we change the default AA for PPC if we do a separate one?).

Yeah, just change the default AA to also include this in the stack.
It seems like it would be a win in all cases.

Fixing at the Clang level might be the right thing to do… But if there are other cases like this, the traversal would definitely help in general to get a proper handling of restrict.

You mean noalias :wink:
(restrict has weird semantics, and doesn’t exist at the LLVM level.
noalias has sane semantics, and is close to restrict, but better)

In any case, this will help more than just your case, so it seems obvious to me that unless it’s expensive, it should be on by default.

From: "Daniel Berlin" <dberlin@dberlin.org>
To: "Olivier Sallenave" <ol.sall@gmail.com>
Cc: llvmdev@cs.uiuc.edu
Sent: Sunday, March 15, 2015 6:48:41 PM
Subject: Re: [LLVMdev] Alias analysis issue with structs on PPC

> Hi Daniel,

> Thanks for your feedback. I would prefer not to write a new AA.
> Can't
> we directly implement that traversal in BasicAA?

Can I ask why?
Outside of the "well, it's another pass", i mean?

BasicAA is stateless, so you can't cache, and you really don't want
to redo these walks repeatedly (especially when the answer doesn't
change unless AA is invalidated). It would be really expensive.
Doing this in a separate analysis pass, caching the answer, and
producing AA results, seems to me exactly the right thing.

> Otherwise, I'll investigate why this i64 was generated in the first
> place (but like you, I don't really want to know why :slight_smile:

Reid walked over to my desk and told me all the gory details - the
clang ABI lowering of structs like these for anything but x86
basically says "oh, this really should be 2 8 byte GPR's, and the
way it makes this happen is by using i64's :slight_smile:

If you want to do it at a clang level, the right thing to do is to
fixup the ABI lowerings for pointers to keep them pointers in this
case.

So this is an artifact of the way that we pass structures, and constructing a general solution at the ABI level might be tricky. I've cc'd Uli, who did most of the recent work here.

For the single-element struct case, we could fix this by keeping it a pointer type. The relevant code in Clang is in lib/CodeGen/TargetInfo.cpp (look at PPC64_SVR4_ABIInfo::classifyArgumentType and nearby code). But that does not really address the underlying issue:

If I take your example and modify it so that we have:

struct box {
double* source;
double* source2;
};

then the parameter is passed as:

define void @test(double* noalias nocapture %result, [2 x i64] %my_struct.coerce, i32 signext %len) #0

and is extracted the same way:

%my_struct.coerce.fca.0.extract = extractvalue [2 x i64] %my_struct.coerce, 0
%0 = inttoptr i64 %my_struct.coerce.fca.0.extract to double*

but, it is also important to realize that the i64 here in the array type is actually chosen to satisfy alignment requirements. If we have this:

typedef float __attribute__((ext_vector_type(4))) vt;
struct box {
double* source;
double* source2;
vt v;
};

then the struct is passed as:

define void @test(double* noalias nocapture %result, [2 x i128] %my_struct.coerce, i32 signext %len)

and the extraction code looks like:

%my_struct.coerce.fca.0.extract = extractvalue [2 x i128] %my_struct.coerce, 0
%my_struct.sroa.0.sroa.0.0.extract.shift = lshr i128 %my_struct.coerce.fca.0.extract, 64
%my_struct.sroa.0.sroa.0.0.extract.trunc = trunc i128 %my_struct.sroa.0.sroa.0.0.extract.shift to i64
%0 = inttoptr i64 %my_struct.sroa.0.sroa.0.0.extract.trunc to double*

so just using pointer types instead of i64 will help common cases, but will not address the general issue. Now part of this does some down to using array parameters as a substitute for byval/direct parameters. As I recall, this was done because it allowed a natural partial decomposition between GPRs and stack for structures that straddle the number of available parameter-passing GPRs. If we could accomplish that with regular byval parameters and regular direct parameters, then we'd not need any of this array coercion, and the system, including for the purposes of aliasing analysis, would work as intended. There may be some infrastructure work required in the backend (SelectionDAG builder, etc.) -- Uli, if you know please comment -- but I think moving away from the array coercions might be the right solution, even if that requires some infrastructure enhancements.

-Hal

So, every backend interprets 'byval' differently, but it usually means
"pass this whole thing in stack memory". It also requires extra copies
through memory at the IR level, so I don't think we should be moving
towards this construct.

If you want to pass things in registers, it's usually best to use SSA
values. Even though the extra 'extractvalue' instructions look expensive in
the IR, they lower down to simple virtual register copies in the selection
dag. The shift and trunc, on the other hand, don't model the machine code
at all, and it would be good if we could eliminate them.

I wonder if we could solve this parameter alignment problem via the 'align'
parameter attribute. Unfortunately, I think for pointer types it's already
overloaded to describe the alignment of the pointee and not the argument
itself. In fact, I think you did this Hal. :slight_smile:

I think, in the long term, we should probably use a direct FCA. I believe
this is what ARM does. It's also nice to flatten the FCA if we can detect
that we're in a simple case where no interesting alignment is required.

Daniel,

You said “more advanced AA’s like CFL-AA will give up on anything that comes from or goes through ptrtoint/inttoptr”: indeed using CFL-AA doesn’t solve the noalias problem here.

But do you know why it would give up? I would understand if there was any pointer arithmetic, but for trivial inttoptr/ptrtoint it should just propagate the noalias information correctly.

Finally, very naive question here: why isn’t CFL-AA used by default? Is that kind of experimental so far?

Thanks,
Olivier

Daniel,

You said “more advanced AA’s like CFL-AA will give up on anything that comes from or goes through ptrtoint/inttoptr”: indeed using CFL-AA doesn’t solve the noalias problem here.

But do you know why it would give up? I would understand if there was any pointer arithmetic, but for trivial inttoptr/ptrtoint it should just propagate the noalias information correctly.

It could, as long as it was directly intoptr/ptrtoint, not give up.

But here you have a further problem:
You are calling intoptr on an argument.
In general, it has no idea where this argument came from.
It has no idea what this int points to.
It can’t say anything reasonable about it other than “no idea”.

If it sees the call and the callee, interprocedural analysis could try to do something, but you are still going to have bad results in the non-LTO case (and even then, in the non-whole-program case)

This will be true of all pointer analysis, not just CFL-AA.

In any case, George, want to put “update ptrtoint/inttoptr to wait until it sees an operation happen on the the pointer before declaring it bad” on the list of stuff to do?

Finally, very naive question here: why isn’t CFL-AA used by default? Is that kind of experimental so far?

There are some bugs remaining to be worked out, and then tuning likely needs to happen in other passes . When you give better AA answers than you had before, and increase freedom for optimizers/scheduling/regalloc to do things, the initial results tend to be negative because they become over-aggressive :slight_smile:

In any case, George, want to put “update ptrtoint/inttoptr to wait until it sees an operation happen on the the pointer before declaring it bad” on the list of stuff to do?

https://llvm.org/bugs/show_bug.cgi?id=22927 :slight_smile:

George

From: "Reid Kleckner" <rnk@google.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Olivier Sallenave" <ol.sall@gmail.com>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Monday, March 16, 2015 1:24:34 PM
Subject: Re: [LLVMdev] Alias analysis issue with structs on PPC

so just using pointer types instead of i64 will help common cases,
but will not address the general issue. Now part of this does some
down to using array parameters as a substitute for byval/direct
parameters. As I recall, this was done because it allowed a natural
partial decomposition between GPRs and stack for structures that
straddle the number of available parameter-passing GPRs. If we could
accomplish that with regular byval parameters and regular direct
parameters, then we'd not need any of this array coercion, and the
system, including for the purposes of aliasing analysis, would work
as intended. There may be some infrastructure work required in the
backend (SelectionDAG builder, etc.) -- Uli, if you know please
comment -- but I think moving away from the array coercions might be
the right solution, even if that requires some infrastructure
enhancements.

So, every backend interprets 'byval' differently, but it usually
means "pass this whole thing in stack memory". It also requires
extra copies through memory at the IR level, so I don't think we
should be moving towards this construct.

If you want to pass things in registers, it's usually best to use SSA
values. Even though the extra 'extractvalue' instructions look
expensive in the IR, they lower down to simple virtual register
copies in the selection dag. The shift and trunc, on the other hand,
don't model the machine code at all, and it would be good if we
could eliminate them.

I wonder if we could solve this parameter alignment problem via the
'align' parameter attribute. Unfortunately, I think for pointer
types it's already overloaded to describe the alignment of the
pointee and not the argument itself. In fact, I think you did this
Hal. :slight_smile:

Indeed I did :slight_smile:

I think, in the long term, we should probably use a direct FCA. I
believe this is what ARM does. It's also nice to flatten the FCA if
we can detect that we're in a simple case where no interesting
alignment is required.

I agree, this seems to make a lot of sense (I'm assuming FCA == first class aggregate). We could make align on an FCA mean the right thing as necessary.

-Hal

If you want to do it at a clang level, the right thing to do is to
fixup the ABI lowerings for pointers to keep them pointers in this case.
So this is an artifact of the way that we pass structures, and
constructing a general solution at the ABI level might be tricky.
I've cc'd Uli, who did most of the recent work here.

For the single-element struct case, we could fix this by keeping it
a pointer type. The relevant code in Clang is in lib/CodeGen/
TargetInfo.cpp (look at PPC64_SVR4_ABIInfo::classifyArgumentType and
nearby code). But that does not really address the underlying issue:

If I take your example and modify it so that we have:

struct box {
    double* source;
    double* source2;
};

then the parameter is passed as:

define void @test(double* noalias nocapture %result, [2 x i64] %
my_struct.coerce, i32 signext %len) #0

and is extracted the same way:

  %my_struct.coerce.fca.0.extract = extractvalue [2 x i64] %
my_struct.coerce, 0
  %0 = inttoptr i64 %my_struct.coerce.fca.0.extract to double*

but, it is also important to realize that the i64 here in the array
type is actually chosen to satisfy alignment requirements. If we have

this:

typedef float __attribute__((ext_vector_type(4))) vt;
struct box {
    double* source;
    double* source2;
    vt v;
};

then the struct is passed as:

define void @test(double* noalias nocapture %result, [2 x i128] %
my_struct.coerce, i32 signext %len)

and the extraction code looks like:

  %my_struct.coerce.fca.0.extract = extractvalue [2 x i128] %
my_struct.coerce, 0
  %my_struct.sroa.0.sroa.0.0.extract.shift = lshr i128 %
my_struct.coerce.fca.0.extract, 64
  %my_struct.sroa.0.sroa.0.0.extract.trunc = trunc i128 %
my_struct.sroa.0.sroa.0.0.extract.shift to i64
  %0 = inttoptr i64 %my_struct.sroa.0.sroa.0.0.extract.trunc to double*

so just using pointer types instead of i64 will help common cases,
but will not address the general issue. Now part of this does some
down to using array parameters as a substitute for byval/direct
parameters. As I recall, this was done because it allowed a natural
partial decomposition between GPRs and stack for structures that
straddle the number of available parameter-passing GPRs. If we could
accomplish that with regular byval parameters and regular direct
parameters, then we'd not need any of this array coercion, and the
system, including for the purposes of aliasing analysis, would work
as intended. There may be some infrastructure work required in the
backend (SelectionDAG builder, etc.) -- Uli, if you know please
comment -- but I think moving away from the array coercions might be
the right solution, even if that requires some infrastructure

enhancements.

I still think "byval" is the wrong approach here. Using "byval",
the LLVM target-independent codegen layers assume the ABI requires
actually passing a *pointer* to the argument -- but that's not true
on PowerPC64, no pointer is in fact being passed.

This means that the back-end, in those cases where we use byval,
has to fake up an address when common code asks for the value of
the argument. This is not too bad if the argument was in fact
actually passed (fully) on the stack, but if it wasn't, we have
to write the argument (partially) to the stack, and possibly
even allocate stack space, just to satisfy common code ...

Now, passing such arguments as "direct" might make more sense. But
with the current infrastructure, this doesn't really work either.
For one, there is currently no place to attach alignment information
to a "direct" argument, and the back-end is not able in all cases to
reconstruct the required on-stack alignment from the LLVM type info.
In addition, passing a struct type "direct" causes clang codegen
common code in many cases to "flatten" the argument, i.e. pass the
struct elements as separate arguments on the LLVM IR level. This
may in some cases be OK since it results in the same binary ABI;
but in other cases it is definitively wrong.

Still, if we need to move away from coerced array types, fixing
the infrastructure such that "direct" can be used would be my
prefered solution. If I understood the original email thread
correctly, the current situation does not lead to incorrect code
generation, just potentially suboptimal code generation? In this
case, I guess an incremental solution could be employed; we could
start with avoiding the array coercion in those cases where using
the struct type as direct type is already safe even with the
current infrastructure, and then expand the set of types where
this is true by successively improving the infrastructure.

Bye,
Ulrich

From: "Ulrich Weigand" <Ulrich.Weigand@de.ibm.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Daniel Berlin" <dberlin@dberlin.org>, llvmdev@cs.uiuc.edu, "Olivier Sallenave" <ol.sall@gmail.com>
Sent: Tuesday, March 17, 2015 4:56:48 AM
Subject: Re: [LLVMdev] Alias analysis issue with structs on PPC

> If you want to do it at a clang level, the right thing to do is to
> fixup the ABI lowerings for pointers to keep them pointers in this
> case.
> So this is an artifact of the way that we pass structures, and
> constructing a general solution at the ABI level might be tricky.
> I've cc'd Uli, who did most of the recent work here.
>
> For the single-element struct case, we could fix this by keeping it
> a pointer type. The relevant code in Clang is in lib/CodeGen/
> TargetInfo.cpp (look at PPC64_SVR4_ABIInfo::classifyArgumentType
> and
> nearby code). But that does not really address the underlying
> issue:
>
> If I take your example and modify it so that we have:
>
> struct box {
> double* source;
> double* source2;
> };
>
> then the parameter is passed as:
>
> define void @test(double* noalias nocapture %result, [2 x i64] %
> my_struct.coerce, i32 signext %len) #0
>
> and is extracted the same way:
>
> %my_struct.coerce.fca.0.extract = extractvalue [2 x i64] %
> my_struct.coerce, 0
> %0 = inttoptr i64 %my_struct.coerce.fca.0.extract to double*
>
> but, it is also important to realize that the i64 here in the array
> type is actually chosen to satisfy alignment requirements. If we
> have
this:
>
> typedef float __attribute__((ext_vector_type(4))) vt;
> struct box {
> double* source;
> double* source2;
> vt v;
> };
>
> then the struct is passed as:
>
> define void @test(double* noalias nocapture %result, [2 x i128] %
> my_struct.coerce, i32 signext %len)
>
> and the extraction code looks like:
>
> %my_struct.coerce.fca.0.extract = extractvalue [2 x i128] %
> my_struct.coerce, 0
> %my_struct.sroa.0.sroa.0.0.extract.shift = lshr i128 %
> my_struct.coerce.fca.0.extract, 64
> %my_struct.sroa.0.sroa.0.0.extract.trunc = trunc i128 %
> my_struct.sroa.0.sroa.0.0.extract.shift to i64
> %0 = inttoptr i64 %my_struct.sroa.0.sroa.0.0.extract.trunc to
> double*
>
> so just using pointer types instead of i64 will help common cases,
> but will not address the general issue. Now part of this does some
> down to using array parameters as a substitute for byval/direct
> parameters. As I recall, this was done because it allowed a natural
> partial decomposition between GPRs and stack for structures that
> straddle the number of available parameter-passing GPRs. If we
> could
> accomplish that with regular byval parameters and regular direct
> parameters, then we'd not need any of this array coercion, and the
> system, including for the purposes of aliasing analysis, would work
> as intended. There may be some infrastructure work required in the
> backend (SelectionDAG builder, etc.) -- Uli, if you know please
> comment -- but I think moving away from the array coercions might
> be
> the right solution, even if that requires some infrastructure
enhancements.

I still think "byval" is the wrong approach here. Using "byval",
the LLVM target-independent codegen layers assume the ABI requires
actually passing a *pointer* to the argument -- but that's not true
on PowerPC64, no pointer is in fact being passed.

This means that the back-end, in those cases where we use byval,
has to fake up an address when common code asks for the value of
the argument. This is not too bad if the argument was in fact
actually passed (fully) on the stack, but if it wasn't, we have
to write the argument (partially) to the stack, and possibly
even allocate stack space, just to satisfy common code ...

Now, passing such arguments as "direct" might make more sense. But
with the current infrastructure, this doesn't really work either.
For one, there is currently no place to attach alignment information
to a "direct" argument, and the back-end is not able in all cases to
reconstruct the required on-stack alignment from the LLVM type info.
In addition, passing a struct type "direct" causes clang codegen
common code in many cases to "flatten" the argument, i.e. pass the
struct elements as separate arguments on the LLVM IR level. This
may in some cases be OK since it results in the same binary ABI;
but in other cases it is definitively wrong.

Still, if we need to move away from coerced array types, fixing
the infrastructure such that "direct" can be used would be my
prefered solution. If I understood the original email thread
correctly, the current situation does not lead to incorrect code
generation, just potentially suboptimal code generation? In this
case, I guess an incremental solution could be employed; we could
start with avoiding the array coercion in those cases where using
the struct type as direct type is already safe even with the
current infrastructure, and then expand the set of types where
this is true by successively improving the infrastructure.

That's correct, the code is correct, but potentially suboptimal. I agree with your assessment, and an incremental approach certainly makes sense. Regarding the general case, which requires infrastructure enhancement, we just need to decide what improvement is needed. As I mentioned in the e-mail to Reid, we could certainly attach an 'align' attribute to a 'direct' struct argument, and update the infrastructure so that it means the right thing.

I did a quick experiment, and changed the IR from the previous example so that we have:

%struct.box3 = type { double*, double*, <4 x float> }

and the function taking the parameter like this:

define void @test3(double* noalias nocapture %result, %struct.box3 %my_struct, i32 %len)

and then extracting the first member like this:

  %0 = extractvalue %struct.box3 %my_struct, 0

and I ran it though the standard pipeline at -O3, and nothing changed that (so we, at least, don't always break up these FCA arguments into separate parameters). What I mean above is that it might make sense to change things so that we're allowed to use this:

  define void @test3(double* noalias nocapture %result, %struct.box3 %my_struct align 16, i32 %len)

and then I imagine that the SDAGBuilder needs to keep track of which parameter values being lowered are part of a structure (in the same way as it keeps track of this for arrays currently), if it does not do this already.

Thanks again,
Hal

From: "Hal Finkel" <hfinkel@anl.gov>
To: "Ulrich Weigand" <Ulrich.Weigand@de.ibm.com>
Cc: llvmdev@cs.uiuc.edu
Sent: Tuesday, March 17, 2015 6:55:11 AM
Subject: Re: [LLVMdev] Alias analysis issue with structs on PPC

> From: "Ulrich Weigand" <Ulrich.Weigand@de.ibm.com>
> To: "Hal Finkel" <hfinkel@anl.gov>
> Cc: "Daniel Berlin" <dberlin@dberlin.org>, llvmdev@cs.uiuc.edu,
> "Olivier Sallenave" <ol.sall@gmail.com>
> Sent: Tuesday, March 17, 2015 4:56:48 AM
> Subject: Re: [LLVMdev] Alias analysis issue with structs on PPC
>
>
> > If you want to do it at a clang level, the right thing to do is
> > to
> > fixup the ABI lowerings for pointers to keep them pointers in
> > this
> > case.
> > So this is an artifact of the way that we pass structures, and
> > constructing a general solution at the ABI level might be tricky.
> > I've cc'd Uli, who did most of the recent work here.
> >
> > For the single-element struct case, we could fix this by keeping
> > it
> > a pointer type. The relevant code in Clang is in lib/CodeGen/
> > TargetInfo.cpp (look at PPC64_SVR4_ABIInfo::classifyArgumentType
> > and
> > nearby code). But that does not really address the underlying
> > issue:
> >
> > If I take your example and modify it so that we have:
> >
> > struct box {
> > double* source;
> > double* source2;
> > };
> >
> > then the parameter is passed as:
> >
> > define void @test(double* noalias nocapture %result, [2 x i64] %
> > my_struct.coerce, i32 signext %len) #0
> >
> > and is extracted the same way:
> >
> > %my_struct.coerce.fca.0.extract = extractvalue [2 x i64] %
> > my_struct.coerce, 0
> > %0 = inttoptr i64 %my_struct.coerce.fca.0.extract to double*
> >
> > but, it is also important to realize that the i64 here in the
> > array
> > type is actually chosen to satisfy alignment requirements. If we
> > have
> this:
> >
> > typedef float __attribute__((ext_vector_type(4))) vt;
> > struct box {
> > double* source;
> > double* source2;
> > vt v;
> > };
> >
> > then the struct is passed as:
> >
> > define void @test(double* noalias nocapture %result, [2 x i128] %
> > my_struct.coerce, i32 signext %len)
> >
> > and the extraction code looks like:
> >
> > %my_struct.coerce.fca.0.extract = extractvalue [2 x i128] %
> > my_struct.coerce, 0
> > %my_struct.sroa.0.sroa.0.0.extract.shift = lshr i128 %
> > my_struct.coerce.fca.0.extract, 64
> > %my_struct.sroa.0.sroa.0.0.extract.trunc = trunc i128 %
> > my_struct.sroa.0.sroa.0.0.extract.shift to i64
> > %0 = inttoptr i64 %my_struct.sroa.0.sroa.0.0.extract.trunc to
> > double*
> >
> > so just using pointer types instead of i64 will help common
> > cases,
> > but will not address the general issue. Now part of this does
> > some
> > down to using array parameters as a substitute for byval/direct
> > parameters. As I recall, this was done because it allowed a
> > natural
> > partial decomposition between GPRs and stack for structures that
> > straddle the number of available parameter-passing GPRs. If we
> > could
> > accomplish that with regular byval parameters and regular direct
> > parameters, then we'd not need any of this array coercion, and
> > the
> > system, including for the purposes of aliasing analysis, would
> > work
> > as intended. There may be some infrastructure work required in
> > the
> > backend (SelectionDAG builder, etc.) -- Uli, if you know please
> > comment -- but I think moving away from the array coercions might
> > be
> > the right solution, even if that requires some infrastructure
> enhancements.
>
> I still think "byval" is the wrong approach here. Using "byval",
> the LLVM target-independent codegen layers assume the ABI requires
> actually passing a *pointer* to the argument -- but that's not true
> on PowerPC64, no pointer is in fact being passed.
>
> This means that the back-end, in those cases where we use byval,
> has to fake up an address when common code asks for the value of
> the argument. This is not too bad if the argument was in fact
> actually passed (fully) on the stack, but if it wasn't, we have
> to write the argument (partially) to the stack, and possibly
> even allocate stack space, just to satisfy common code ...
>
> Now, passing such arguments as "direct" might make more sense. But
> with the current infrastructure, this doesn't really work either.
> For one, there is currently no place to attach alignment
> information
> to a "direct" argument, and the back-end is not able in all cases
> to
> reconstruct the required on-stack alignment from the LLVM type
> info.
> In addition, passing a struct type "direct" causes clang codegen
> common code in many cases to "flatten" the argument, i.e. pass the
> struct elements as separate arguments on the LLVM IR level. This
> may in some cases be OK since it results in the same binary ABI;
> but in other cases it is definitively wrong.
>
> Still, if we need to move away from coerced array types, fixing
> the infrastructure such that "direct" can be used would be my
> prefered solution. If I understood the original email thread
> correctly, the current situation does not lead to incorrect code
> generation, just potentially suboptimal code generation? In this
> case, I guess an incremental solution could be employed; we could
> start with avoiding the array coercion in those cases where using
> the struct type as direct type is already safe even with the
> current infrastructure, and then expand the set of types where
> this is true by successively improving the infrastructure.

That's correct, the code is correct, but potentially suboptimal. I
agree with your assessment, and an incremental approach certainly
makes sense. Regarding the general case, which requires
infrastructure enhancement, we just need to decide what improvement
is needed. As I mentioned in the e-mail to Reid, we could certainly
attach an 'align' attribute to a 'direct' struct argument, and
update the infrastructure so that it means the right thing.

I did a quick experiment, and changed the IR from the previous
example so that we have:

%struct.box3 = type { double*, double*, <4 x float> }

and the function taking the parameter like this:

define void @test3(double* noalias nocapture %result, %struct.box3
%my_struct, i32 %len)

and then extracting the first member like this:

  %0 = extractvalue %struct.box3 %my_struct, 0

and I ran it though the standard pipeline at -O3, and nothing changed
that (so we, at least, don't always break up these FCA arguments
into separate parameters). What I mean above is that it might make
sense to change things so that we're allowed to use this:

  define void @test3(double* noalias nocapture %result, %struct.box3
  %my_struct align 16, i32 %len)

Minor correction: The syntax should be:

define void @test3(double* noalias nocapture %result, %struct.box3 align 16 %my_struct, i32 %len)

and this is currently allowed by the verifier, but I think that it currently does not mean anything (the alignment is ignored because the argument does not have pointer type).

-Hal