PROPOSAL: IR representation of detailed struct assignment information

Hello,

Currently LLVM expects front-ends to lower struct assignments into either
individual scalar loads and stores, or calls to @llvm.memcpy. For structs
with lots of fields, it can take a lot of scalar loads and stores, so
@llvm.memcpy is used instead. Unfortunately, using @llvm.memcpy does not
permit full TBAA information to be preserved. Also, it unnecessarily copies
any padding bytes between fields, which can lead to unnecessary copying in
the case where the optimizer or codegen decide to split it back up into
individual loads and stores.

Chris wrote up some ideas about the struct padding part of this problem [0];
this proposal extends that proposal and adds the capability to represent
TBAA information for the members of the fields in a struct assignment.

Here's an example showing the basic problem:

struct bar {
  char x;
  float y;
  double z;
};
void copy_bar(struct bar *a, struct bar *b) {
  *a = *b;
}

We get this IR:

  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 8, i1 false)

This works, but it doesn't retain the information that the bytes between fields
x and y don't really need to be copied, and it doesn't inform the optimizer
that there are three fields with TBAA-relevant types being copied.

The solution I propose here is to have front-ends describe the copy using
metadata. For example:

  %struct.foo = type { i8, float, double }
  […]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 8, i1 false), !struct.assignment !4
  […]
  !0 = metadata !{metadata !"Simple C/C++ TBAA"}
  !1 = metadata !{metadata !"omnipotent char", metadata !0}
  !2 = metadata !{metadata !"float", metadata !1}
  !3 = metadata !{metadata !"double", metadata !1}
  !4 = metadata !{ %struct.foo* null, metadata !5 }
  !5 = metadata !{ metadata !1, metadata !2, metadata !3 }

Metadata nodes !0 through !3 are regular TBAA nodes as are already in use.

Metadata node !4 here is a top-level description of the memcpy. Its first
operand is a null pointer, which is there just for its type. It specifies
a (pointer to an) IR-level struct type the memcpy can be thought of as
copying. The second operand is and MDNode which describes the TBAA values
for the fields. The indices of the operands in that MDNode directly
correspond to the indices of the members in the IR-level struct type.

With this information, optimizer and codegen can more aggressively optimize
the memcpy. In particular, it would be possible for the optimizer to expand
the memcpy into a series of loads and stores with complete TBAA information.
Also, the optimize could determine where the padding is by examining the
struct layout of the IR-level struct definition.

Note that this is not a proposal for struct-access-path aware TBAA, or
even full struct value TBAA. This is just a way to preserve basic scalar
TBAA for individual members of structs in a struct assignment.

Comments and questions are welcome.

Dan

[0] http://nondot.org/sabre/LLVMNotes/BetterStructureCopyOptimization.txt

Hello,

Currently LLVM expects front-ends to lower struct assignments into
either individual scalar loads and stores, or calls to @llvm.memcpy.
For structs with lots of fields, it can take a lot of scalar loads
and stores, so @llvm.memcpy is used instead. Unfortunately, using
@llvm.memcpy does not permit full TBAA information to be preserved.
Also, it unnecessarily copies any padding bytes between fields, which
can lead to unnecessary copying in the case where the optimizer or
codegen decide to split it back up into individual loads and stores.

Chris wrote up some ideas about the struct padding part of this
problem [0]; this proposal extends that proposal and adds the
capability to represent TBAA information for the members of the
fields in a struct assignment.

Here's an example showing the basic problem:

struct bar {
  char x;
  float y;
  double z;
};
void copy_bar(struct bar *a, struct bar *b) {
  *a = *b;
}

We get this IR:

  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 8,
i1 false)

This works, but it doesn't retain the information that the bytes
between fields x and y don't really need to be copied, and it doesn't
inform the optimizer that there are three fields with TBAA-relevant
types being copied.

The solution I propose here is to have front-ends describe the copy
using metadata. For example:

  %struct.foo = type { i8, float, double }
  […]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 8,
i1 false), !struct.assignment !4 […]

I think that it would make more sense to name this !struct.tbaa -- it
seems logically similar to existing TBAA metadata (in that it is
attached to the relevant load/store instruction).

-Hal

How about !tbaa.struct, kicking off a prefix-namespace idiom?

On the other hand, TBAA is only half the story here; the other half is
describing struct padding. I don't have strong opinions here; does anyone
else?

Dan

>
>> call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32
>> 8, i1 false), !struct.assignment !4 […]
>
> I think that it would make more sense to name this !struct.tbaa --
> it seems logically similar to existing TBAA metadata (in that it is
> attached to the relevant load/store instruction).

How about !tbaa.struct, kicking off a prefix-namespace idiom?

Fine by me.

On the other hand, TBAA is only half the story here; the other half is
describing struct padding.

By this you mean that it is possible to determine which parts of the
copy are padding because we now have direct access to the pointer type
information, right? Out of curiosity, is there a reason not to change
the memcpy to take non-integer pointers? If memcpy took the
struct pointer without needing the bitcast, then we'd have this
information directly.

Thanks again,
Hal

call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32
8, i1 false), !struct.assignment !4 […]

I think that it would make more sense to name this !struct.tbaa --
it seems logically similar to existing TBAA metadata (in that it is
attached to the relevant load/store instruction).

How about !tbaa.struct, kicking off a prefix-namespace idiom?

Fine by me.

Another alternative would be to separate them into two different MDNodes: one for TBAA info, one for padding info.

On the other hand, TBAA is only half the story here; the other half is
describing struct padding.

By this you mean that it is possible to determine which parts of the
copy are padding because we now have direct access to the pointer type
information, right? Out of curiosity, is there a reason not to change
the memcpy to take non-integer pointers? If memcpy took the
struct pointer without needing the bitcast, then we'd have this
information directly.

This is an interesting idea, particularly given that you're representing struct information with a null pointer. I think that the limiting issue here is that intrinsics cannot encode/mangle general structs in their type signature. What do you think Dan?

-Chris

<moving this to llvmdev now that the lists are back up!>

Interesting approach. The IR type for a struct may or may not be enough to describe holes (think unions and other cases), have you considered a more explicit MDNode that describes the ranges of any holes?

What's the issue with unions? Do you mean unions containing structs
containing holes?

Unions don't lower to a unique or useful IR type. In general, I'm skeptical of anything that uses IR types to reason about source level types (except primitives like integers and floats).

I'm confused. It seems a big difference here between your expectations
and my understanding is that you're expecting to see source level types
here, whereas it hadn't even occurred to me that we should try to represent
source level types.

My point here is that the frontend reasons about two things: 1) a source level construct of a type, and 2) LLVM IR types. The LLVM IR type lowering is not guaranteed cover all fields in the source type (e.g. in the case of unions).

Let me give you a dumb example. Consider:

union x {
  struct { char b; int c; } a;
  short b;
} u;

On my system, Clang codegen's this to:

%union.x = type { %struct.anon }
%struct.anon = type { i8, i32 }

This isn't a safe IR type to use to describe a memcpy (because it wouldn't copy all of "b"), so implementing your proposal would requiring implementing yet-another conversion from AST types to LLVM types that *is* guaranteed to cover all the fields.

Instead of implementing this, it would be a lot easier for clang to walk a type and produce a mask describing all the holes in a type, using a simple recursive algorithm (where union intersects the member "hole sets", finding that byte 3/4 of the union is a hole).

Given this, it makes a lot more sense to explicitly model this hole set in an MDNode (e.g. by using a list of byte ranges?) instead of representing the holes with a null pointer constant of some IR type.

Does this make sense?

-Chris

It's an interesting idea. I can see both sides.

The current (proposed) approach is that @llvm.memcpy does a full memcpy, unless
decorated by metadata, which can specify that some of the bytes are undefined.
There is an appeal in having a base IR which is simple, with metadata for making
it more aggressive in tricky ways.

I can also see the appeal of overloading @llvm.memcpy with different pointee
types. An interesting issue is the semantics in the case where the pointee
types have holes. For example:

  %struct = type { i8, i32 }

  call void @llvm.memcpy.mangledstuff(%struct* %dst, %struct* %src, i64 1, i64 4)

It should always be valid to lower @llvm.memcpy to an actual memcpy call,
but we also want it to be valid to lower this to member-wise loads and
stores. Consequently, bytes corresponding to padding are set to some kind of
undef. That's probably fine, as long as everyone's ok with that.

Another interesting question is how should the size argument work. Should it
be a byte count, or should it be scaled by the size of the pointee type?
Scaling it seems to be intuitive, and it avoids awkwardness in the case where
the size somehow isn't a multiple of the pointee size.

Similarly, how should the alignment argument work? In this case, scaling it
by the alignment of the pointee seems awkward. In the example above, I showed
the size scaled and the alignment left unscaled.

Dan

<moving this to llvmdev now that the lists are back up!>

Interesting approach. The IR type for a struct may or may not be enough to describe holes (think unions and other cases), have you considered a more explicit MDNode that describes the ranges of any holes?

What's the issue with unions? Do you mean unions containing structs
containing holes?

Unions don't lower to a unique or useful IR type. In general, I'm skeptical of anything that uses IR types to reason about source level types (except primitives like integers and floats).

I'm confused. It seems a big difference here between your expectations
and my understanding is that you're expecting to see source level types
here, whereas it hadn't even occurred to me that we should try to represent
source level types.

My point here is that the frontend reasons about two things: 1) a source level construct of a type, and 2) LLVM IR types. The LLVM IR type lowering is not guaranteed cover all fields in the source type (e.g. in the case of unions).

Let me give you a dumb example. Consider:

union x {
struct { char b; int c; } a;
short b;
} u;

Ok, so the answer to my question above is, yes, you are talking about
unions containing structs containing holes.

On my system, Clang codegen's this to:

%union.x = type { %struct.anon }
%struct.anon = type { i8, i32 }

This isn't a safe IR type to use to describe a memcpy (because it wouldn't copy all of "b"), so implementing your proposal would requiring implementing yet-another conversion from AST types to LLVM types that *is* guaranteed to cover all the fields.

Instead of implementing this, it would be a lot easier for clang to walk a type and produce a mask describing all the holes in a type, using a simple recursive algorithm (where union intersects the member "hole sets", finding that byte 3/4 of the union is a hole).

Given this, it makes a lot more sense to explicitly model this hole set in an MDNode (e.g. by using a list of byte ranges?) instead of representing the holes with a null pointer constant of some IR type.

I'll send out a new proposal according to this design.

Dan

>>>
>>>>
>>>>> call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32
>>>>> 8, i1 false), !struct.assignment !4 […]
>>>>
>>>> I think that it would make more sense to name this !struct.tbaa
>>>> -- it seems logically similar to existing TBAA metadata (in that
>>>> it is attached to the relevant load/store instruction).
>>>
>>> How about !tbaa.struct, kicking off a prefix-namespace idiom?
>>
>> Fine by me.
>
> Another alternative would be to separate them into two different
> MDNodes: one for TBAA info, one for padding info.
>
>>> On the other hand, TBAA is only half the story here; the other
>>> half is describing struct padding.
>>
>> By this you mean that it is possible to determine which parts of
>> the copy are padding because we now have direct access to the
>> pointer type information, right? Out of curiosity, is there a
>> reason not to change the memcpy to take non-integer pointers? If
>> memcpy took the struct pointer without needing the bitcast, then
>> we'd have this information directly.
>
> This is an interesting idea, particularly given that you're
> representing struct information with a null pointer. I think that
> the limiting issue here is that intrinsics cannot encode/mangle
> general structs in their type signature. What do you think Dan?

It's an interesting idea. I can see both sides.

The current (proposed) approach is that @llvm.memcpy does a full
memcpy, unless decorated by metadata, which can specify that some of
the bytes are undefined. There is an appeal in having a base IR which
is simple, with metadata for making it more aggressive in tricky ways.

I can also see the appeal of overloading @llvm.memcpy with different
pointee types. An interesting issue is the semantics in the case
where the pointee types have holes. For example:

  %struct = type { i8, i32 }

  call void @llvm.memcpy.mangledstuff(%struct* %dst, %struct* %src,
i64 1, i64 4)

It should always be valid to lower @llvm.memcpy to an actual memcpy
call, but we also want it to be valid to lower this to member-wise
loads and stores. Consequently, bytes corresponding to padding are
set to some kind of undef. That's probably fine, as long as
everyone's ok with that.

Another interesting question is how should the size argument work.
Should it be a byte count, or should it be scaled by the size of the
pointee type? Scaling it seems to be intuitive, and it avoids
awkwardness in the case where the size somehow isn't a multiple of
the pointee size.

I agree that it seems intuitive, but what would we do with the current
intrinsic? Having a rule like, "the size is specified in bytes if the
pointer type is an integer, and is specified in elements otherwise"
seems sure to yield many bugs.

-Hal

I guess I'm late to the party, but another possibility would be to model structure types as lists of members with their offsets from the beginning of the parent aggregate. This would require extensive changes to LLVM, so I'm not sure if it's an option.

Why not simply have something like this?

%1 = load bar* %b
store %1, bar* %a

-K

This has been proposed already, and could also be used by bitfields,
but the changes were too many and was not accepted.

I think the biggest reason against was that it was strongly based on
C++ semantics and not generic enough to be considered IR material.

That preserves an IR type, but not source level types. IR types are insufficient for TBAA or "hole" information.

-Chris

Here's an example showing the basic problem:

struct bar {
char x;
float y;
double z;
};
void copy_bar(struct bar *a, struct bar *b) {
*a = *b;
}

We get this IR:

call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 8, i1 false)

This works, but it doesn't retain the information that the bytes between fields
x and y don't really need to be copied, and it doesn't inform the optimizer
that there are three fields with TBAA-relevant types being copied.

The solution I propose here is to have front-ends describe the copy using
metadata. For example:

Why not simply have something like this?

%1 = load bar* %b
store %1, bar* %a

That preserves an IR type, but not source level types. IR types are insufficient for TBAA or "hole" information.

But do IR types have to remain insufficient? Couldn't we ensure the front-end adds fields to the IR types for all bytes which really exist (in any one of the unioned types). Then holes in the IR type really are holes and we wouldn't need new metadata for holes.

In your example from earlier:

union x {
struct { char b; int c; } a;
short b;
} u;

currently generates

%union.x = type { %struct.anon }
%struct.anon = type { i8, i32 }

but would fully describe all the holes if it was

%union.x = type { %struct.anon }
%struct.anon = type { i8, i8, i32 }

Pete

AFAIK, there's no way to forbid access to an IR type, or a sub-type.
In that case, the holes would be addressable, which is not
semantically valid in any language.

It is possible to use IR types for hole information, but this doesn't solve the TBAA need. However, even if we wanted to use an IR type for hole information, it would add more complexity all around (compared to representing holes with a list of "hole ranges").

I described some of this here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052900.html

It also wouldn't be possible to specify something like "an i16 type where the first byte is defined and the second byte is defined but has alignment of 8 bits", and certainly wouldn't be able to do "the first byte of an i16 memory object is undefined and the second byte is defined". Hole ranges handle these sorts of cases just fine.

-Chris