RFC: Enforcing pointer type alignment in Clang

C 6.3.2.3p7 (N1548) says:
  A pointer to an object type may be converted to a pointer to a
  different object type. If the resulting pointer is not correctly
  aligned) for the referenced type, the behavior is undefined.

C++ [expr.reinterpret.cast]p7 (N4527) defines pointer conversions in terms
of conversions from void*:
  An object pointer can be explicitly converted to an object pointer
  of a different type. When a prvalue v of object pointer type is
  converted to the object pointer type “pointer to cv T”, the result
  is static_cast<cv T*>(static_cast<cv void*>(v)).

C++ [expr.static.cast]p13 says of conversions from void*:
  A prvalue of type “pointer to cv1 void” can be converted to a
  prvalue of type “pointer to cv2 T” .... If the original pointer value
  represents the address A of a byte in memory and A satisfies the alignment
  requirement of T, then the resulting pointer value represents the same
  address as the original pointer value, that is, A. The result of any
  other such pointer conversion is unspecified.

The clear intent of these rules is that the implementation may assume
that any pointer is adequately aligned for its pointee type, because any
attempt to actually create such a pointer has undefined behavior. It is
very likely that, if we found a hole in those rules that seemed to permit
the creation of unaligned pointers, we could go to the committees and
have that hole closed. The language policy here is clear.

There are architectures where this policy is mandatory. The classic
example is (I believe) the Cray C90, which provides word-addressed 32-bit
pointers. Pointers to most types can use this native representation directly.
char*, however, requires sub-word addressing, which means void* and char*
are actually 64 bits in order to permit the storage of the sub-word offset.
An int* therefore literally cannot express an arbitrary void*.

Less dramatically, there are architectural features that clearly depend
on alignment. It's unreasonable to expect processors to support atomic
accesses that straddle the basic unit of their cache coherence implementations.
Supporting small unaligned accesses has a fairly marginal cost in extra
hardware, but as accesses grow to 128 bits or larger, those costs can spiral
out of control. These restrictions are fairly widely understood by compiler
users.

Everything below is mushier. It's clearly advantageous for the compiler to
be able to make stronger assumptions about alignment when accessing memory.
ISAs often allow more efficient accesses to properly-aligned memory; for
example, 32-bit ARM can perform a 64-bit memory access in a single
instruction, but the address is required to be 8-byte-aligned. Alignment
also affects compiler decisions even when the architecture doesn't enforce
it; for example, it can be profitable to combine two adjacent loads into
a single, wider load, but this will often slow down code if the wider load is
no longer properly aligned.

As is the case with most forms of undefined behavior, programmers have at
best an abstract appreciation for the positive effects of these optimizations,
but they have a very concrete understanding of the disruptive life effects
of being forced to fix crashes from mis-alignment.

Our standard response in LLVM/Clang is to explain the undefined behavior
rule, explain the benefits it provides, and politely ask users to, well,
deal with it. And that's appropriate; most forms of undefined behavior
under the standard(s) are reasonable requests with reasonable code
workarounds. However, we have also occasionally looked at a particular
undefined behavior rule and decided that there's a real usability problem
with enforcing it as written. In these cases, we use our power as
implementors to make some subset of that behavior well-defined in order to
fix that problem. For example, we did this for TBAA, because we recognized
that certain "obvious" aliasing violations were idiomatic and only had
awkward workarounds under the standard.

There's a similar problem here. Much like TBAA, fixing it doesn't require
completely abandoning the idea of enforcing type-based alignment assumptions.
It does, however, require a significant adjustment to the language rule.

The problem is this: the standards make it undefined behavior to even
create an unaligned pointer. Therefore, as soon as I've got such a
pointer, I'm basically doomed; it is no longer possible to locally
work around the problem. I have to change the type of the pointer to
something that requires less alignment, and not just where I'm using
it, or even just within my function, but all the way up to wherever it
came from.

For example, suppose I've got this function:

  void processBuffer(const int32_t *buffer, size_t length) {
    ...
  }

I get a bug report saying that my function is crashing, and I decide
that the right fix is to make the function handle unaligned buffers
correctly. Maybe that's a binary-compatibility requirement, or maybe the
buffer is usually coming from a serialized format that doesn't guarantee
alignment, and it's clearly unreasonable to copy the buffer just to satisfy
my function.

So how can I make this function handle unaligned buffers? The type of the
argument itself means that being passed an unaligned buffer has undefined
behavior. Now, I can change that parameter to use an unaligned typedef:

  typedef int32_t unaligned_int32_t __attribute__((aligned(1)));
  void processBuffer(const unaligned_int32_t *buffer, size_t length) {
    ...
  }

But this has severe problems. First off, this is a GCC/Clang extension; a lot
of programmers feel uncomfortable adopting that, especially to fix a problem
that's in principle common across compilers. Second, alignment attributes
are not really part of the type system, which means that they can be
silently dropped by any number of things, including both major features
like templates and just day-to-day quality-of-implementation stuff like the
common-type logic of the conditional operator. And finally, my callers
still have undefined behavior, and I really need to go audit all of them
to make sure they're using the same sort of typedef. This is not a reliable
solution to the bug.

Furthermore, the compiler doesn't really care whether the pointer is
abstractly aligned independent of any access to memory. There aren't very
many interesting alignment-based optimizations on pointer values as mere
values. In principle, we could optimize operations that cast the pointer
to an integral type and examine the low bits, but those operations are not
very common, and when they're there, it's probably for a good reason;
that's the kind of optimization is very likely to just create miscompiles
without really showing any benefit.

Therefore, I would like to propose that Clang formally adopt a significantly
weaker language rule for enforcing the alignment of pointers. The basic
idea is this:

  It is not undefined behavior to create a pointer that is less aligned
  than its pointee type. Instead, it is only undefined behavior to
  access memory through a pointer that is less aligned than its pointee
  type.

That is, the only thing that matters is the type when you actually perform
the access, not any type the pointer might have had at some earlier point
during execution.

Notably, I believe that this rule doesn't require any changes in our
current behavior, so adopting it is just a restriction on future compiler
optimization. For the most part, LLVM IR only attaches alignment to loads,
stores, and specific intrinsics like llvm.memcpy; there is no way to say
that a pointer value is expected to have a particular alignment. The
one exception that I'm aware of is that an indirect parameter can have
an expected alignment. However, Clang currently only sets this for
by-value arguments that the calling convention says to pass indirectly,
and that remains acceptable under this new rule because it's an ABI rule
rather than a constraint on programmer behavior (other than assembly
programmers). The rule just means that we can't start setting it on
arbitrary pointer parameters.

It is also a very portable rule; I'm not aware of any compilers that do
try to take advantage of the formal alignment of pointer values independent
of access.

The key question in this new rule is what counts as an "access". I'll spell
this out in more detail, but it's mostly intuitive: anything that ultimately
requires a load or store. The only thing that's perhaps questionable is that
we'd like to treat calls to library functions that access memory as if they
were direct accesses to their arguments. For example, we'd like to assume
that the pointer arguments to memcpy are properly aligned for their types
(that is, their explicit types, before the implicit conversion to void*) so
that we can generate a more efficient copy operation. This analysis
currently relies on the language rule that pointers may not be misaligned;
preserving it requires us to treat calls to library functions as special,
which of course we already do. Programmers can still suppress this
assumption by explicitly casting the arguments to void*.

Here's the proposed new rule, expressed more formally:

Hi John

Thanks for the very detailed explanation. I’m not a language expert, but i felt like i could understand what you want to do here.

In particular, as a user, I’m surprised it doesn’t already work (in the standard) they way you said it should. That is

It is not undefined behavior to create a pointer that is less aligned
than its pointee type. Instead, it is only undefined behavior to
access memory through a pointer that is less aligned than its pointee
type.

I thought the above was how it should work now. Its the behaviour i expect when I write code already, and as a user i’d hope that this continues to be what we implement.

So yeah, +1 from me.

Thanks,
Pete

Sounds like a good idea.

For example, on Hexagon, the long vectors (64- and 128-bytes long) normally need to be aligned to a boundary that is a multiple of their size. There exist, however, instructions to load/store vectors at an unaligned address, although they have some restrictions that the aligned instructions don't. Treating vectors as if they had to be aligned, while allowing unaligned pointers makes perfect sense.

-Krzysztof

From: "John McCall via cfe-dev" <cfe-dev@lists.llvm.org>
To: cfe-dev@lists.llvm.org, llvm-dev@lists.llvm.org
Sent: Thursday, January 14, 2016 2:56:37 PM
Subject: [cfe-dev] RFC: Enforcing pointer type alignment in Clang

C 6.3.2.3p7 (N1548) says:
  A pointer to an object type may be converted to a pointer to a
  different object type. If the resulting pointer is not correctly
  aligned) for the referenced type, the behavior is undefined.

C++ [expr.reinterpret.cast]p7 (N4527) defines pointer conversions in
terms
of conversions from void*:
  An object pointer can be explicitly converted to an object pointer
  of a different type. When a prvalue v of object pointer type is
  converted to the object pointer type “pointer to cv T”, the result
  is static_cast<cv T*>(static_cast<cv void*>(v)).

C++ [expr.static.cast]p13 says of conversions from void*:
  A prvalue of type “pointer to cv1 void” can be converted to a
  prvalue of type “pointer to cv2 T” .... If the original pointer
  value
  represents the address A of a byte in memory and A satisfies the
  alignment
  requirement of T, then the resulting pointer value represents the
  same
  address as the original pointer value, that is, A. The result of
  any
  other such pointer conversion is unspecified.

The clear intent of these rules is that the implementation may assume
that any pointer is adequately aligned for its pointee type, because
any
attempt to actually create such a pointer has undefined behavior. It
is
very likely that, if we found a hole in those rules that seemed to
permit
the creation of unaligned pointers, we could go to the committees and
have that hole closed. The language policy here is clear.

There are architectures where this policy is mandatory. The classic
example is (I believe) the Cray C90, which provides word-addressed
32-bit
pointers. Pointers to most types can use this native representation
directly.
char*, however, requires sub-word addressing, which means void* and
char*
are actually 64 bits in order to permit the storage of the sub-word
offset.
An int* therefore literally cannot express an arbitrary void*.

Less dramatically, there are architectural features that clearly
depend
on alignment. It's unreasonable to expect processors to support
atomic
accesses that straddle the basic unit of their cache coherence
implementations.
Supporting small unaligned accesses has a fairly marginal cost in
extra
hardware, but as accesses grow to 128 bits or larger, those costs can
spiral
out of control. These restrictions are fairly widely understood by
compiler
users.

Everything below is mushier. It's clearly advantageous for the
compiler to
be able to make stronger assumptions about alignment when accessing
memory.
ISAs often allow more efficient accesses to properly-aligned memory;
for
example, 32-bit ARM can perform a 64-bit memory access in a single
instruction, but the address is required to be 8-byte-aligned.
Alignment
also affects compiler decisions even when the architecture doesn't
enforce
it; for example, it can be profitable to combine two adjacent loads
into
a single, wider load, but this will often slow down code if the wider
load is
no longer properly aligned.

As is the case with most forms of undefined behavior, programmers
have at
best an abstract appreciation for the positive effects of these
optimizations,
but they have a very concrete understanding of the disruptive life
effects
of being forced to fix crashes from mis-alignment.

Our standard response in LLVM/Clang is to explain the undefined
behavior
rule, explain the benefits it provides, and politely ask users to,
well,
deal with it. And that's appropriate; most forms of undefined
behavior
under the standard(s) are reasonable requests with reasonable code
workarounds. However, we have also occasionally looked at a
particular
undefined behavior rule and decided that there's a real usability
problem
with enforcing it as written. In these cases, we use our power as
implementors to make some subset of that behavior well-defined in
order to
fix that problem. For example, we did this for TBAA, because we
recognized
that certain "obvious" aliasing violations were idiomatic and only
had
awkward workarounds under the standard.

There's a similar problem here. Much like TBAA, fixing it doesn't
require
completely abandoning the idea of enforcing type-based alignment
assumptions.
It does, however, require a significant adjustment to the language
rule.

The problem is this: the standards make it undefined behavior to even
create an unaligned pointer. Therefore, as soon as I've got such a
pointer, I'm basically doomed; it is no longer possible to locally
work around the problem. I have to change the type of the pointer to
something that requires less alignment, and not just where I'm using
it, or even just within my function, but all the way up to wherever
it
came from.

For example, suppose I've got this function:

  void processBuffer(const int32_t *buffer, size_t length) {
    ...
  }

I get a bug report saying that my function is crashing, and I decide
that the right fix is to make the function handle unaligned buffers
correctly. Maybe that's a binary-compatibility requirement, or maybe
the
buffer is usually coming from a serialized format that doesn't
guarantee
alignment, and it's clearly unreasonable to copy the buffer just to
satisfy
my function.

So how can I make this function handle unaligned buffers? The type
of the
argument itself means that being passed an unaligned buffer has
undefined
behavior. Now, I can change that parameter to use an unaligned
typedef:

  typedef int32_t unaligned_int32_t __attribute__((aligned(1)));
  void processBuffer(const unaligned_int32_t *buffer, size_t length)
  {
    ...
  }

But this has severe problems. First off, this is a GCC/Clang
extension; a lot
of programmers feel uncomfortable adopting that, especially to fix a
problem
that's in principle common across compilers. Second, alignment
attributes
are not really part of the type system, which means that they can be
silently dropped by any number of things, including both major
features
like templates and just day-to-day quality-of-implementation stuff
like the
common-type logic of the conditional operator. And finally, my
callers
still have undefined behavior, and I really need to go audit all of
them
to make sure they're using the same sort of typedef. This is not a
reliable
solution to the bug.

Furthermore, the compiler doesn't really care whether the pointer is
abstractly aligned independent of any access to memory. There aren't
very
many interesting alignment-based optimizations on pointer values as
mere
values. In principle, we could optimize operations that cast the
pointer
to an integral type and examine the low bits, but those operations
are not
very common, and when they're there, it's probably for a good reason;
that's the kind of optimization is very likely to just create
miscompiles
without really showing any benefit.

Therefore, I would like to propose that Clang formally adopt a
significantly
weaker language rule for enforcing the alignment of pointers. The
basic
idea is this:

  It is not undefined behavior to create a pointer that is less
  aligned
  than its pointee type. Instead, it is only undefined behavior to
  access memory through a pointer that is less aligned than its
  pointee
  type.

That is, the only thing that matters is the type when you actually
perform
the access, not any type the pointer might have had at some earlier
point
during execution.

Notably, I believe that this rule doesn't require any changes in our
current behavior, so adopting it is just a restriction on future
compiler
optimization.

For the sake of completeness, I'll mention one exception. If the pointer (or its type via a typedef) as the __attribute__((align_value(N))) attribute, then we do emit alignment attributes on the pointer values themselves and use that information in later optimizations. This is by design, but given that it is explicitly opt-in, I feel this falls into a different category than the situations you've described.

Realistically, if we ever were to implement optimizations based on default type alignments, we'd need a flag to turn off those assumptions (just like we have a flag to turn off strict aliasing assumptions).

-Hal

Are those rules the same as the rules for what you’re allowed to do with null pointers? I think it would be pretty nice if the rules for what’s allowable to do with a null pointer, and what’s allowable to do with an unaligned pointer were the same rules. (Even if basically nobody can actually understand what the rules are…)

E.g. apparently this is considered okay in C++, despite the apparent dereference of “foo”. (from chat on #llvm, apparently it arguably isn’t valid per the standard, but CWG232 calls that a defect, and it is explicitly valid in C, per C11 6.5.3.2 paragraph 3):
int *foo = 0;

int *bar = &*foo;

So, in your rules, is this okay, or not:

int foo;
int unaligned = (int)(1 + (char*)(&foo));
int *bar = &*unaligned;

Or, for a more straightforward example, why should this be allowed on a value of “highlyAlignedStruct” which is not aligned properly for the struct’s type:

char *member = &highlyAlignedStruct->charMember;

while this is not allowed, neither in C nor C++:

char member = &((someStruct)0)->charMember;

Note that no memory is actually loaded in either case.

C 6.3.2.3p7 (N1548) says:
  A pointer to an object type may be converted to a pointer to a
  different object type. If the resulting pointer is not correctly
  aligned) for the referenced type, the behavior is undefined.

C++ [expr.reinterpret.cast]p7 (N4527) defines pointer conversions in terms
of conversions from void*:
  An object pointer can be explicitly converted to an object pointer
  of a different type. When a prvalue v of object pointer type is
  converted to the object pointer type “pointer to cv T”, the result
  is static_cast<cv T*>(static_cast<cv void*>(v)).

C++ [expr.static.cast]p13 says of conversions from void*:
  A prvalue of type “pointer to cv1 void” can be converted to a
  prvalue of type “pointer to cv2 T” .... If the original pointer value
  represents the address A of a byte in memory and A satisfies the
alignment
  requirement of T, then the resulting pointer value represents the same
  address as the original pointer value, that is, A. The result of any
  other such pointer conversion is unspecified.

The clear intent of these rules is that the implementation may assume
that any pointer is adequately aligned for its pointee type, because any
attempt to actually create such a pointer has undefined behavior. It is
very likely that, if we found a hole in those rules that seemed to permit
the creation of unaligned pointers, we could go to the committees and
have that hole closed. The language policy here is clear.

There are architectures where this policy is mandatory. The classic
example is (I believe) the Cray C90, which provides word-addressed 32-bit
pointers. Pointers to most types can use this native representation
directly.
char*, however, requires sub-word addressing, which means void* and char*
are actually 64 bits in order to permit the storage of the sub-word offset.
An int* therefore literally cannot express an arbitrary void*.

Less dramatically, there are architectural features that clearly depend
on alignment. It's unreasonable to expect processors to support atomic
accesses that straddle the basic unit of their cache coherence
implementations.
Supporting small unaligned accesses has a fairly marginal cost in extra
hardware, but as accesses grow to 128 bits or larger, those costs can
spiral
out of control. These restrictions are fairly widely understood by
compiler
users.

Everything below is mushier. It's clearly advantageous for the compiler to
be able to make stronger assumptions about alignment when accessing memory.
ISAs often allow more efficient accesses to properly-aligned memory; for
example, 32-bit ARM can perform a 64-bit memory access in a single
instruction, but the address is required to be 8-byte-aligned. Alignment
also affects compiler decisions even when the architecture doesn't enforce
it; for example, it can be profitable to combine two adjacent loads into
a single, wider load, but this will often slow down code if the wider load
is
no longer properly aligned.

As is the case with most forms of undefined behavior, programmers have at
best an abstract appreciation for the positive effects of these
optimizations,
but they have a very concrete understanding of the disruptive life effects
of being forced to fix crashes from mis-alignment.

Our standard response in LLVM/Clang is to explain the undefined behavior
rule, explain the benefits it provides, and politely ask users to, well,
deal with it. And that's appropriate; most forms of undefined behavior
under the standard(s) are reasonable requests with reasonable code
workarounds. However, we have also occasionally looked at a particular
undefined behavior rule and decided that there's a real usability problem
with enforcing it as written. In these cases, we use our power as
implementors to make some subset of that behavior well-defined in order to
fix that problem. For example, we did this for TBAA, because we recognized
that certain "obvious" aliasing violations were idiomatic and only had
awkward workarounds under the standard.

There's a similar problem here. Much like TBAA, fixing it doesn't require
completely abandoning the idea of enforcing type-based alignment
assumptions.
It does, however, require a significant adjustment to the language rule.

The problem is this: the standards make it undefined behavior to even
create an unaligned pointer. Therefore, as soon as I've got such a
pointer, I'm basically doomed; it is no longer possible to locally
work around the problem. I have to change the type of the pointer to
something that requires less alignment, and not just where I'm using
it, or even just within my function, but all the way up to wherever it
came from.

For example, suppose I've got this function:

  void processBuffer(const int32_t *buffer, size_t length) {
    ...
  }

I get a bug report saying that my function is crashing, and I decide
that the right fix is to make the function handle unaligned buffers
correctly. Maybe that's a binary-compatibility requirement, or maybe the
buffer is usually coming from a serialized format that doesn't guarantee
alignment, and it's clearly unreasonable to copy the buffer just to satisfy
my function.

So how can I make this function handle unaligned buffers? The type of the
argument itself means that being passed an unaligned buffer has undefined
behavior. Now, I can change that parameter to use an unaligned typedef:

  typedef int32_t unaligned_int32_t __attribute__((aligned(1)));
  void processBuffer(const unaligned_int32_t *buffer, size_t length) {
    ...
  }

But this has severe problems. First off, this is a GCC/Clang extension; a
lot
of programmers feel uncomfortable adopting that, especially to fix a
problem
that's in principle common across compilers. Second, alignment attributes
are not really part of the type system, which means that they can be
silently dropped by any number of things, including both major features
like templates and just day-to-day quality-of-implementation stuff like the
common-type logic of the conditional operator. And finally, my callers
still have undefined behavior, and I really need to go audit all of them
to make sure they're using the same sort of typedef. This is not a
reliable
solution to the bug.

Furthermore, the compiler doesn't really care whether the pointer is
abstractly aligned independent of any access to memory. There aren't very
many interesting alignment-based optimizations on pointer values as mere
values. In principle, we could optimize operations that cast the pointer
to an integral type and examine the low bits, but those operations are not
very common, and when they're there, it's probably for a good reason;
that's the kind of optimization is very likely to just create miscompiles
without really showing any benefit.

Therefore, I would like to propose that Clang formally adopt a
significantly
weaker language rule for enforcing the alignment of pointers. The basic
idea is this:

  It is not undefined behavior to create a pointer that is less aligned
  than its pointee type. Instead, it is only undefined behavior to
  access memory through a pointer that is less aligned than its pointee
  type.

That is, the only thing that matters is the type when you actually perform
the access, not any type the pointer might have had at some earlier point
during execution.

Notably, I believe that this rule doesn't require any changes in our
current behavior, so adopting it is just a restriction on future compiler
optimization. For the most part, LLVM IR only attaches alignment to loads,
stores, and specific intrinsics like llvm.memcpy; there is no way to say
that a pointer value is expected to have a particular alignment. The
one exception that I'm aware of is that an indirect parameter can have
an expected alignment. However, Clang currently only sets this for
by-value arguments that the calling convention says to pass indirectly,
and that remains acceptable under this new rule because it's an ABI rule
rather than a constraint on programmer behavior (other than assembly
programmers). The rule just means that we can't start setting it on
arbitrary pointer parameters.

It is also a very portable rule; I'm not aware of any compilers that do
try to take advantage of the formal alignment of pointer values independent
of access.

The key question in this new rule is what counts as an "access". I'll
spell
this out in more detail, but it's mostly intuitive: anything that
ultimately
requires a load or store. The only thing that's perhaps questionable is
that
we'd like to treat calls to library functions that access memory as if they
were direct accesses to their arguments. For example, we'd like to assume
that the pointer arguments to memcpy are properly aligned for their types
(that is, their explicit types, before the implicit conversion to void*) so
that we can generate a more efficient copy operation. This analysis
currently relies on the language rule that pointers may not be misaligned;
preserving it requires us to treat calls to library functions as special,
which of course we already do. Programmers can still suppress this
assumption by explicitly casting the arguments to void*.

Here's the proposed new rule, expressed more formally:

---

It is well-defined behavior to construct a pointer to memory that
is less aligned than the alignment of the pointee type (if a complete
type). However, it is undefined behavior to “access” an expression that
is an r-value of type T* or an l-value of type T if T is a complete type
and the memory is less aligned than T.

An r-value expression of pointer type is accessed if:
- it is dereferenced (with *) and the resulting l-value is accessed,
- it is implicitly converted to another pointer type and the
   result is accessed,
- it undergoes pointer addition and the result is accessed,
- it is passed to a function in the C standard library that is known
   to access the memory,
- in C++, it is converted to a pointer to a virtual base, or
- in C++, it is explicitly cast (other than by a reinterpret_cast) to
   a related class pointer type and the result is accessed.

An l-value expression is accessed if:
- it undergoes an lvalue-to-rvalue conversion (i.e. it is loaded),
- it is the LHS of an assignment operator (including the
   compound assignments),
- it is the base of a member access (with .) and the resulting l-value
   is accessed (recall that x->y is defined as ((*x).y),

I think this is overreaching. The member access has undefined behavior if
there isn't an object of the right type designated by the lvalue, which is
a much stronger requirement than that the alignment holds (if the base is
misaligned, you still get immediate UB, even with your proposed change,
because there is no object present at that misaligned location).

- it undergoes indirection (with &) and the resulting pointer is accessed,

- in C++, it is implicitly converted to be an l-value to a base type
   and the result is accessed,

Again, you get UB in the conversion if there's not an actual object present.

- in C++, it is converted to be an l-value of a virtual base type,

- in C++, it is used as the "this""" argument of a call to a
   non-static member function, or
- in C++, a reference is bound to it (which includes explicit
   casts to reference type).

These are the cases covered by the language standard. There is a
very long tail of other kinds of expression that obviously access memory,
like the atomic and overflow builtins, which I can't reasonably enumerate.
The intent should be obvious, but I'm willing to spell it out in other
cases where necessary.

Note that this definition is *syntactic*, meaning that it is expressed
in terms of the components of a single statement. This means that an
access that might be undefined behavior if written as a single statement:
  highlyAlignedStruct->charMember = 0;
may not be undefined behavior if split across two statements:
  “char *member = &highlyAlignedStruct->charMember;
  *member = 0;
In effect, the compiler promises to never propagate alignment assumptions
between statements through its knowledge of how a pointer was constructed.
This is necessary in order to allow local workarounds to be reliable.

Note also that this definition does not propagate through explicit casts,
other than class-hierarchy casts in C++. Again, this is a deliberate
choice to make misalignment workarounds more straightforward.

But note that this rule does still allow the compiler to make stronger
abstract assumptions about the alignment of C++ references and the
"this" pointer.

It seems like this is a subset of your proposed rule:

The result of casting a pointer of one type to a pointer of another type is
a pointer to the original object, and (after some series of pointer casts)
may be used to access that object (or things that are permitted to alias
it) even if some intermediate pointer type has an alignment requirement
that the object does not satisfy.

I certainly support that subset. I think it's reasonable for us to
guarantee that object pointer values will round-trip through any series of
casts through object pointer types (and intptr_t) so long as you end up
with the right type (a type that is alias-compatible with the original
type) before you perform the access.

Your above proposal seems to also make a collection of other cases valid,
where the problem is something other than misalignment. For instance, it
sounds like you also want to make pointer arithmetic valid when there is no
underlying array, and perhaps when the arithmetic leaves the underlying
complete object entirely?

I’m not sure where you got the idea that my proposal supersedes all the other rules about pointer validity or aliasing. It removes the specific clauses from the quotes above about alignment and instead re-imposes alignment requirements based on the immediate form of the expression in several other places that touch memory. I intentionally did not layer this onto the existing definition of access because non-static data members are formally accessed even if you don’t touch the underlying memory, and I don’t feel that the compiler should be allowed to assume alignment in those situations. If you feel that there’s a better formalization that still captures that, I’m open to it.

John.

From: "John McCall via cfe-dev" <cfe-dev@lists.llvm.org>
To: cfe-dev@lists.llvm.org, llvm-dev@lists.llvm.org
Sent: Thursday, January 14, 2016 2:56:37 PM
Subject: [cfe-dev] RFC: Enforcing pointer type alignment in Clang

C 6.3.2.3p7 (N1548) says:
A pointer to an object type may be converted to a pointer to a
different object type. If the resulting pointer is not correctly
aligned) for the referenced type, the behavior is undefined.

C++ [expr.reinterpret.cast]p7 (N4527) defines pointer conversions in
terms
of conversions from void*:
An object pointer can be explicitly converted to an object pointer
of a different type. When a prvalue v of object pointer type is
converted to the object pointer type “pointer to cv T”, the result
is static_cast<cv T*>(static_cast<cv void*>(v)).

C++ [expr.static.cast]p13 says of conversions from void*:
A prvalue of type “pointer to cv1 void” can be converted to a
prvalue of type “pointer to cv2 T” .... If the original pointer
value
represents the address A of a byte in memory and A satisfies the
alignment
requirement of T, then the resulting pointer value represents the
same
address as the original pointer value, that is, A. The result of
any
other such pointer conversion is unspecified.

The clear intent of these rules is that the implementation may assume
that any pointer is adequately aligned for its pointee type, because
any
attempt to actually create such a pointer has undefined behavior. It
is
very likely that, if we found a hole in those rules that seemed to
permit
the creation of unaligned pointers, we could go to the committees and
have that hole closed. The language policy here is clear.

There are architectures where this policy is mandatory. The classic
example is (I believe) the Cray C90, which provides word-addressed
32-bit
pointers. Pointers to most types can use this native representation
directly.
char*, however, requires sub-word addressing, which means void* and
char*
are actually 64 bits in order to permit the storage of the sub-word
offset.
An int* therefore literally cannot express an arbitrary void*.

Less dramatically, there are architectural features that clearly
depend
on alignment. It's unreasonable to expect processors to support
atomic
accesses that straddle the basic unit of their cache coherence
implementations.
Supporting small unaligned accesses has a fairly marginal cost in
extra
hardware, but as accesses grow to 128 bits or larger, those costs can
spiral
out of control. These restrictions are fairly widely understood by
compiler
users.

Everything below is mushier. It's clearly advantageous for the
compiler to
be able to make stronger assumptions about alignment when accessing
memory.
ISAs often allow more efficient accesses to properly-aligned memory;
for
example, 32-bit ARM can perform a 64-bit memory access in a single
instruction, but the address is required to be 8-byte-aligned.
Alignment
also affects compiler decisions even when the architecture doesn't
enforce
it; for example, it can be profitable to combine two adjacent loads
into
a single, wider load, but this will often slow down code if the wider
load is
no longer properly aligned.

As is the case with most forms of undefined behavior, programmers
have at
best an abstract appreciation for the positive effects of these
optimizations,
but they have a very concrete understanding of the disruptive life
effects
of being forced to fix crashes from mis-alignment.

Our standard response in LLVM/Clang is to explain the undefined
behavior
rule, explain the benefits it provides, and politely ask users to,
well,
deal with it. And that's appropriate; most forms of undefined
behavior
under the standard(s) are reasonable requests with reasonable code
workarounds. However, we have also occasionally looked at a
particular
undefined behavior rule and decided that there's a real usability
problem
with enforcing it as written. In these cases, we use our power as
implementors to make some subset of that behavior well-defined in
order to
fix that problem. For example, we did this for TBAA, because we
recognized
that certain "obvious" aliasing violations were idiomatic and only
had
awkward workarounds under the standard.

There's a similar problem here. Much like TBAA, fixing it doesn't
require
completely abandoning the idea of enforcing type-based alignment
assumptions.
It does, however, require a significant adjustment to the language
rule.

The problem is this: the standards make it undefined behavior to even
create an unaligned pointer. Therefore, as soon as I've got such a
pointer, I'm basically doomed; it is no longer possible to locally
work around the problem. I have to change the type of the pointer to
something that requires less alignment, and not just where I'm using
it, or even just within my function, but all the way up to wherever
it
came from.

For example, suppose I've got this function:

void processBuffer(const int32_t *buffer, size_t length) {
   ...
}

I get a bug report saying that my function is crashing, and I decide
that the right fix is to make the function handle unaligned buffers
correctly. Maybe that's a binary-compatibility requirement, or maybe
the
buffer is usually coming from a serialized format that doesn't
guarantee
alignment, and it's clearly unreasonable to copy the buffer just to
satisfy
my function.

So how can I make this function handle unaligned buffers? The type
of the
argument itself means that being passed an unaligned buffer has
undefined
behavior. Now, I can change that parameter to use an unaligned
typedef:

typedef int32_t unaligned_int32_t __attribute__((aligned(1)));
void processBuffer(const unaligned_int32_t *buffer, size_t length)
{
   ...
}

But this has severe problems. First off, this is a GCC/Clang
extension; a lot
of programmers feel uncomfortable adopting that, especially to fix a
problem
that's in principle common across compilers. Second, alignment
attributes
are not really part of the type system, which means that they can be
silently dropped by any number of things, including both major
features
like templates and just day-to-day quality-of-implementation stuff
like the
common-type logic of the conditional operator. And finally, my
callers
still have undefined behavior, and I really need to go audit all of
them
to make sure they're using the same sort of typedef. This is not a
reliable
solution to the bug.

Furthermore, the compiler doesn't really care whether the pointer is
abstractly aligned independent of any access to memory. There aren't
very
many interesting alignment-based optimizations on pointer values as
mere
values. In principle, we could optimize operations that cast the
pointer
to an integral type and examine the low bits, but those operations
are not
very common, and when they're there, it's probably for a good reason;
that's the kind of optimization is very likely to just create
miscompiles
without really showing any benefit.

Therefore, I would like to propose that Clang formally adopt a
significantly
weaker language rule for enforcing the alignment of pointers. The
basic
idea is this:

It is not undefined behavior to create a pointer that is less
aligned
than its pointee type. Instead, it is only undefined behavior to
access memory through a pointer that is less aligned than its
pointee
type.

That is, the only thing that matters is the type when you actually
perform
the access, not any type the pointer might have had at some earlier
point
during execution.

Notably, I believe that this rule doesn't require any changes in our
current behavior, so adopting it is just a restriction on future
compiler
optimization.

For the sake of completeness, I'll mention one exception. If the pointer (or its type via a typedef) as the __attribute__((align_value(N))) attribute, then we do emit alignment attributes on the pointer values themselves and use that information in later optimizations. This is by design, but given that it is explicitly opt-in, I feel this falls into a different category than the situations you've described.

Sure, that seems reasonable. It’s the default language rule I’m concerned about.

John.

C 6.3.2.3p7 (N1548) says:
  A pointer to an object type may be converted to a pointer to a
  different object type. If the resulting pointer is not correctly
  aligned) for the referenced type, the behavior is undefined.

C++ [expr.reinterpret.cast]p7 (N4527) defines pointer conversions in terms
of conversions from void*:
  An object pointer can be explicitly converted to an object pointer
  of a different type. When a prvalue v of object pointer type is
  converted to the object pointer type “pointer to cv T”, the result
  is static_cast<cv T*>(static_cast<cv void*>(v)).

C++ [expr.static.cast]p13 says of conversions from void*:
  A prvalue of type “pointer to cv1 void” can be converted to a
  prvalue of type “pointer to cv2 T” .... If the original pointer value
  represents the address A of a byte in memory and A satisfies the
alignment
  requirement of T, then the resulting pointer value represents the same
  address as the original pointer value, that is, A. The result of any
  other such pointer conversion is unspecified.

The clear intent of these rules is that the implementation may assume
that any pointer is adequately aligned for its pointee type, because any
attempt to actually create such a pointer has undefined behavior. It is
very likely that, if we found a hole in those rules that seemed to permit
the creation of unaligned pointers, we could go to the committees and
have that hole closed. The language policy here is clear.

There are architectures where this policy is mandatory. The classic
example is (I believe) the Cray C90, which provides word-addressed 32-bit
pointers. Pointers to most types can use this native representation
directly.
char*, however, requires sub-word addressing, which means void* and char*
are actually 64 bits in order to permit the storage of the sub-word
offset.
An int* therefore literally cannot express an arbitrary void*.

Less dramatically, there are architectural features that clearly depend
on alignment. It's unreasonable to expect processors to support atomic
accesses that straddle the basic unit of their cache coherence
implementations.
Supporting small unaligned accesses has a fairly marginal cost in extra
hardware, but as accesses grow to 128 bits or larger, those costs can
spiral
out of control. These restrictions are fairly widely understood by
compiler
users.

Everything below is mushier. It's clearly advantageous for the compiler
to
be able to make stronger assumptions about alignment when accessing
memory.
ISAs often allow more efficient accesses to properly-aligned memory; for
example, 32-bit ARM can perform a 64-bit memory access in a single
instruction, but the address is required to be 8-byte-aligned. Alignment
also affects compiler decisions even when the architecture doesn't enforce
it; for example, it can be profitable to combine two adjacent loads into
a single, wider load, but this will often slow down code if the wider
load is
no longer properly aligned.

As is the case with most forms of undefined behavior, programmers have at
best an abstract appreciation for the positive effects of these
optimizations,
but they have a very concrete understanding of the disruptive life effects
of being forced to fix crashes from mis-alignment.

Our standard response in LLVM/Clang is to explain the undefined behavior
rule, explain the benefits it provides, and politely ask users to, well,
deal with it. And that's appropriate; most forms of undefined behavior
under the standard(s) are reasonable requests with reasonable code
workarounds. However, we have also occasionally looked at a particular
undefined behavior rule and decided that there's a real usability problem
with enforcing it as written. In these cases, we use our power as
implementors to make some subset of that behavior well-defined in order to
fix that problem. For example, we did this for TBAA, because we
recognized
that certain "obvious" aliasing violations were idiomatic and only had
awkward workarounds under the standard.

There's a similar problem here. Much like TBAA, fixing it doesn't require
completely abandoning the idea of enforcing type-based alignment
assumptions.
It does, however, require a significant adjustment to the language rule.

The problem is this: the standards make it undefined behavior to even
create an unaligned pointer. Therefore, as soon as I've got such a
pointer, I'm basically doomed; it is no longer possible to locally
work around the problem. I have to change the type of the pointer to
something that requires less alignment, and not just where I'm using
it, or even just within my function, but all the way up to wherever it
came from.

For example, suppose I've got this function:

  void processBuffer(const int32_t *buffer, size_t length) {
    ...
  }

I get a bug report saying that my function is crashing, and I decide
that the right fix is to make the function handle unaligned buffers
correctly. Maybe that's a binary-compatibility requirement, or maybe the
buffer is usually coming from a serialized format that doesn't guarantee
alignment, and it's clearly unreasonable to copy the buffer just to
satisfy
my function.

So how can I make this function handle unaligned buffers? The type of the
argument itself means that being passed an unaligned buffer has undefined
behavior. Now, I can change that parameter to use an unaligned typedef:

  typedef int32_t unaligned_int32_t __attribute__((aligned(1)));
  void processBuffer(const unaligned_int32_t *buffer, size_t length) {
    ...
  }

But this has severe problems. First off, this is a GCC/Clang extension;
a lot
of programmers feel uncomfortable adopting that, especially to fix a
problem
that's in principle common across compilers. Second, alignment attributes
are not really part of the type system, which means that they can be
silently dropped by any number of things, including both major features
like templates and just day-to-day quality-of-implementation stuff like
the
common-type logic of the conditional operator. And finally, my callers
still have undefined behavior, and I really need to go audit all of them
to make sure they're using the same sort of typedef. This is not a
reliable
solution to the bug.

Furthermore, the compiler doesn't really care whether the pointer is
abstractly aligned independent of any access to memory. There aren't very
many interesting alignment-based optimizations on pointer values as mere
values. In principle, we could optimize operations that cast the pointer
to an integral type and examine the low bits, but those operations are not
very common, and when they're there, it's probably for a good reason;
that's the kind of optimization is very likely to just create miscompiles
without really showing any benefit.

Therefore, I would like to propose that Clang formally adopt a
significantly
weaker language rule for enforcing the alignment of pointers. The basic
idea is this:

  It is not undefined behavior to create a pointer that is less aligned
  than its pointee type. Instead, it is only undefined behavior to
  access memory through a pointer that is less aligned than its pointee
  type.

That is, the only thing that matters is the type when you actually perform
the access, not any type the pointer might have had at some earlier point
during execution.

Notably, I believe that this rule doesn't require any changes in our
current behavior, so adopting it is just a restriction on future compiler
optimization. For the most part, LLVM IR only attaches alignment to
loads,
stores, and specific intrinsics like llvm.memcpy; there is no way to say
that a pointer value is expected to have a particular alignment. The
one exception that I'm aware of is that an indirect parameter can have
an expected alignment. However, Clang currently only sets this for
by-value arguments that the calling convention says to pass indirectly,
and that remains acceptable under this new rule because it's an ABI rule
rather than a constraint on programmer behavior (other than assembly
programmers). The rule just means that we can't start setting it on
arbitrary pointer parameters.

It is also a very portable rule; I'm not aware of any compilers that do
try to take advantage of the formal alignment of pointer values
independent
of access.

The key question in this new rule is what counts as an "access". I'll
spell
this out in more detail, but it's mostly intuitive: anything that
ultimately
requires a load or store. The only thing that's perhaps questionable is
that
we'd like to treat calls to library functions that access memory as if
they
were direct accesses to their arguments. For example, we'd like to assume
that the pointer arguments to memcpy are properly aligned for their types
(that is, their explicit types, before the implicit conversion to void*)
so
that we can generate a more efficient copy operation. This analysis
currently relies on the language rule that pointers may not be misaligned;
preserving it requires us to treat calls to library functions as special,
which of course we already do. Programmers can still suppress this
assumption by explicitly casting the arguments to void*.

Here's the proposed new rule, expressed more formally:

---

It is well-defined behavior to construct a pointer to memory that
is less aligned than the alignment of the pointee type (if a complete
type). However, it is undefined behavior to “access” an expression that
is an r-value of type T* or an l-value of type T if T is a complete type
and the memory is less aligned than T.

An r-value expression of pointer type is accessed if:
- it is dereferenced (with *) and the resulting l-value is accessed,
- it is implicitly converted to another pointer type and the
   result is accessed,
- it undergoes pointer addition and the result is accessed,
- it is passed to a function in the C standard library that is known
   to access the memory,
- in C++, it is converted to a pointer to a virtual base, or
- in C++, it is explicitly cast (other than by a reinterpret_cast) to
   a related class pointer type and the result is accessed.

An l-value expression is accessed if:
- it undergoes an lvalue-to-rvalue conversion (i.e. it is loaded),
- it is the LHS of an assignment operator (including the
   compound assignments),
- it is the base of a member access (with .) and the resulting l-value
   is accessed (recall that x->y is defined as ((*x).y),

I think this is overreaching. The member access has undefined behavior if
there isn't an object of the right type designated by the lvalue, which is
a much stronger requirement than that the alignment holds (if the base is
misaligned, you still get immediate UB, even with your proposed change,
because there is no object present at that misaligned location).

- it undergoes indirection (with &) and the resulting pointer is accessed,

- in C++, it is implicitly converted to be an l-value to a base type
   and the result is accessed,

Again, you get UB in the conversion if there's not an actual object
present.

- in C++, it is converted to be an l-value of a virtual base type,

- in C++, it is used as the "this""" argument of a call to a
   non-static member function, or
- in C++, a reference is bound to it (which includes explicit
   casts to reference type).

These are the cases covered by the language standard. There is a
very long tail of other kinds of expression that obviously access memory,
like the atomic and overflow builtins, which I can't reasonably enumerate.
The intent should be obvious, but I'm willing to spell it out in other
cases where necessary.

Note that this definition is *syntactic*, meaning that it is expressed
in terms of the components of a single statement. This means that an
access that might be undefined behavior if written as a single statement:
  highlyAlignedStruct->charMember = 0;
may not be undefined behavior if split across two statements:
  “char *member = &highlyAlignedStruct->charMember;
  *member = 0;
In effect, the compiler promises to never propagate alignment assumptions
between statements through its knowledge of how a pointer was constructed.
This is necessary in order to allow local workarounds to be reliable.

Note also that this definition does not propagate through explicit casts,
other than class-hierarchy casts in C++. Again, this is a deliberate
choice to make misalignment workarounds more straightforward.

But note that this rule does still allow the compiler to make stronger
abstract assumptions about the alignment of C++ references and the
"this" pointer.

It seems like this is a subset of your proposed rule:

The result of casting a pointer of one type to a pointer of another type
is a pointer to the original object, and (after some series of pointer
casts) may be used to access that object (or things that are permitted to
alias it) even if some intermediate pointer type has an alignment
requirement that the object does not satisfy.

I certainly support that subset. I think it's reasonable for us to
guarantee that object pointer values will round-trip through any series of
casts through object pointer types (and intptr_t) so long as you end up
with the right type (a type that is alias-compatible with the original
type) before you perform the access.

Your above proposal seems to also make a collection of other cases valid,
where the problem is something other than misalignment. For instance, it
sounds like you also want to make pointer arithmetic valid when there is no
underlying array, and perhaps when the arithmetic leaves the underlying
complete object entirely?

I’m not sure where you got the idea that my proposal supersedes all the
other rules about pointer validity or aliasing.

Your list of what constitutes an access distinguishes between the case of a
misaligned member access where the result is accessed and a misaligned
member access where the result is not accessed, suggesting that there is a
difference between those two cases (where both cases actually remain UB).
See James Knight's reply for an example of confusion stemming from this.

(Note that sometimes the only way we detect the UB stemming from member
access on a non-object -- for instance, with UBSan -- is because the
pointer is misaligned. Your list can be read as suggesting that the UBSan
alignment check for member access would violate our guarantees.)

It removes the specific clauses from the quotes above about alignment and

instead re-imposes alignment requirements based on the immediate form of
the expression in several other places that touch memory. I intentionally
did not layer this onto the existing definition of access because
non-static data members are formally accessed even if you don’t touch the
underlying memory,

C++ confusingly has two different notions: "access" (meaning read or
modify), and "member access" (meaning forming a glvalue referring to a
member from a glvalue referring to a class object); C has a similar wording
confusion. Imposing this on the former notion of "access" would seem
appropriate, but the aliasing rules already cover it by requiring that an
object of the right type exist at the accessed location (and therefore that
the location is correctly aligned[1]).

and I don’t feel that the compiler should be allowed to assume alignment in

those situations. If you feel that there’s a better formalization that
still captures that, I’m open to it.

How about we say that we implement this delta to the C and C++ standards:

C 6.3.2.3p7 (N1548):
  A pointer to an object type may be converted to a pointer to a
  different object type. <del>If the resulting pointer is not correctly
  aligned for the referenced type, the behavior is undefined.</del>

C++ [expr.static.cast]p13:
  A prvalue of type “pointer to cv1 void” can be converted to a
  prvalue of type “pointer to cv2 T” .... If the original pointer value
  represents the address A of a byte in memory <del>and A satisfies the
alignment
  requirement of T</del>, then the resulting pointer value represents the
same
  address as the original pointer value, that is, A.

[1]: That's not completely true, as it's possible to create an object that
is underaligned for its type:

struct __attribute__((packed)) A {
  char k;
  struct B { int n; } b;
} a;
int main() {
  int *q = &p->n; // UB? UBSan diagnoses this member access
  return *q; // Obviously UB
}

It seems that we do need to have some syntactic rules for how far the known
alignment propagates to handle this case; your proposed rules don't do the
right thing here.

Part of my point is indeed an acknowledgement that valid objects can exist at misaligned addresses, and it should not be UB to perform a member access into them as long as the memory isn’t accessed. Consider, say, a pointer serialized data structure held in an unaligned buffer. I am trying to say that code which drills into that data structure via that pointer and then works around the lack of alignment on the resulting address is not buggy; you seem to be suggesting that it is, and that the user has a responsibility to ensure that all of their pointer arithmetic is done on properly-aligned pointers. I don’t think that’s a defensible model.

That’s true; I do not think the UBSan alignment check should be kicking in when we’re not accessing memory.

John.

Arithmetic (e.g. +), and casts is one thing. But, you're actually
*dereferencing* a pointer which isn't valid (granted the dereference is
syntactic, and doesn't result in an actual load.) I don't understand why
you think it is not a defensible model to (continue to) forbid this?

Furthermore, if that is to be valid, why should it only be the case for
misalignment, and not null?

That is, I'd argue that either "&X->foo" should be considered always valid,
regardless of what the value of the X pointer is, OR it must (as it does
now) require an X pointing to an actually valid object. Going halfway and
saying it's only valid for some kinds of invalid X pointers does not seem
like a good way to go.

(Sorry for the duplicate mail, Richard, I accidentally sent a copy only to you before.)

(Sorry for the duplicate mail, Richard, I accidentally sent a copy only to you before.)

The question at hand is whether we should require the user to write this:
misaligned_A_B *p = &a.b;
instead of, say:
int x = (misaligned_int) &p->n;
because we want to reserve the right to invoke undefined behavior and propagate our “knowledge" that p is 4-byte-aligned to “improve” the 1-byte-aligned access on the next line.

My contention is that this is a clean and elegantly simple formal model that is disastrously bad for actual users because it is no longer possible to locally work around a mis-alignment bug without tracking the entire history of the pointer. It is the sort of compiler policy that gets people to roll their eyes and ask for new options called things like -fno-strict-type-alignment which gradually get adopted by 90% of projects.

John.

Hi John,

The question at hand is whether we should require the user to write this:
  misaligned_A_B *p = &a.b;
instead of, say:
  A::B *p = &a.b;
  int x = *(misaligned_int*) &p->n;
because we want to reserve the right to invoke undefined behavior and propagate our “knowledge" that p is 4-byte-aligned to “improve” the 1-byte-aligned access on the next line.

My contention is that this is a clean and elegantly simple formal model that is disastrously bad for actual users because it is no longer possible to locally work around a mis-alignment bug without tracking the entire history of the pointer. It is the sort of compiler policy that gets people to roll their eyes and ask for new options called things like -fno-strict-type-alignment which gradually get adopted by 90% of projects.

I’ve had the misfortune to look at a lot of code that does unaligned access over the last few years. By far the most common reason for it that I’ve seen is networking code that uses packed structures to represent packets. For example:

__attribute__((packed))
struct somePacket
{
  uint8_t a;
  uint32_t b;
  // ...
};

In your model, what happens when:

- I use field b directly?
- I take the address of field b and store it in an int* variable?

David

Hi John,

The question at hand is whether we should require the user to write this:
misaligned_A_B *p = &a.b;
instead of, say:
A::B *p = &a.b;
int x = *(misaligned_int*) &p->n;
because we want to reserve the right to invoke undefined behavior and propagate our “knowledge" that p is 4-byte-aligned to “improve” the 1-byte-aligned access on the next line.

My contention is that this is a clean and elegantly simple formal model that is disastrously bad for actual users because it is no longer possible to locally work around a mis-alignment bug without tracking the entire history of the pointer. It is the sort of compiler policy that gets people to roll their eyes and ask for new options called things like -fno-strict-type-alignment which gradually get adopted by 90% of projects.

I’ve had the misfortune to look at a lot of code that does unaligned access over the last few years. By far the most common reason for it that I’ve seen is networking code that uses packed structures to represent packets. For example:

__attribute__((packed))
struct somePacket
{
  uint8_t a;
  uint32_t b;
  // ...
};

In your model, what happens when:

- I use field b directly?

The compiler recognizes that this access is to a valid but underaligned uint32_t object and generates code assuming a lower alignment. This doesn’t change, except inasmuch as we gain a formal model that accepts the existence of valid-but-underaligned objects.

- I take the address of field b and store it in an int* variable?

It’s not undefined behavior to form that pointer. It is, however, still undefined behavior to access the object through that int*, because that type assumes a higher alignment. (The undefined behavior buys us a lot here: otherwise, LLVM would have to assume that all pointers are unaligned unless it could prove that they point to aligned memory. That’s prohibitive.) However, if you don’t access the object as an int*, and instead access it in a less-aligned way, there’s no undefined behavior and the code is guaranteed to work.

For example, given this:
  uint32_t *pb = &packet->b;

Under my model, this code would still have undefined behavior and might trap on an alignment-enforcing system:
  uint32_t b = *pb;

This code would still have undefined behavior, because the formal type of the access is still uint32_t here:
  uint32_t b;
  memcpy(&b, pb, sizeof(b));

This code is fine:
  uint32_t b;
  memcpy(&b, (const char*) pb, sizeof(b));

As is this code:
  __attribute__((aligned(1))) typedef uint32_t unaligned_uint32_t;
  uint32_t b = *(unaligned_uint32_t*) pb;

Note that, under the language standards, both of the last two examples have undefined behavior: there’s no concept of a valid unaligned object at all, and if you shoe-horned one in, it would be probably be undefined behavior to take its address. Clang would be allowed to say “okay, you took the address of this, and we can assume it was actually properly aligned despite being the address of a less-aligned object” and then propagate that alignment assumption to the later accesses to promote the alignment assumption. The goal of my model — and perhaps I’ve mis-formalized it, but I think the goal is quite clear — is just to forswear this capability in the compiler.

John.

I think everything you said above is uncontroversial. I believe everyone here is in agreement that reinterpret_casts back and forth to arbitrary pointer types are currently allowed, and must continue to be, allowed by clang. Richard suggested a spec-diff to formalize that. As an implication of that being valid, the cast itself does not and must not in the future impart any knowledge of alignment (even though doing so would be theoretically okay per the current language spec).

That is, for clarity, clang does now and should always continue to allow the following, despite the spec saying it doesn’t need to:

char x = malloc(10);
int y = (int)&x[1]; // assign a misaligned int
.
char c = (char)y; // but accessed only as char*, so no problem.

However, one part of your suggested rules that both I and Richard questioned was the requirement that the expression “&p->n” be valid, even if “p” is misaligned for its type. I still don’t think that it’s necessary or even particularly useful to start allowing that. And, note, that would be an actual change in behavior, not a clarification/formalization of existing behavior.

That is:

  1. Is it valid to do “p->n”, when p is not a valid object which is properly aligned for its type?
  2. Assuming that’s not valid, does adding a & cause it to then be valid, via some special case? E.g. the rule in C that states that “&a[n]” translates to “(a + n)”, and “&*a” translates to “a”, regardless of the value or validity of “a”. (Without that rule, &*a would be invalid, too, if “a” was null or misaligned.)

Just to reiterate, I think the issue here is not about whether clang can make alignment assumptions in later code, it’s about whether the member access expression itself is valid. (If it’s not valid, then what happens in later code is irrelevant.)

Hi John,

The question at hand is whether we should require the user to write this:
misaligned_A_B *p = &a.b;
instead of, say:
A::B *p = &a.b;
int x = (misaligned_int) &p->n;
because we want to reserve the right to invoke undefined behavior and propagate our “knowledge" that p is 4-byte-aligned to “improve” the 1-byte-aligned access on the next line.

My contention is that this is a clean and elegantly simple formal model that is disastrously bad for actual users because it is no longer possible to locally work around a mis-alignment bug without tracking the entire history of the pointer. It is the sort of compiler policy that gets people to roll their eyes and ask for new options called things like -fno-strict-type-alignment which gradually get adopted by 90% of projects.

I’ve had the misfortune to look at a lot of code that does unaligned access over the last few years. By far the most common reason for it that I’ve seen is networking code that uses packed structures to represent packets. For example:

attribute((packed))
struct somePacket
{
uint8_t a;
uint32_t b;
// …
};

In your model, what happens when:

  • I use field b directly?

The compiler recognizes that this access is to a valid but underaligned uint32_t object and generates code assuming a lower alignment. This doesn’t change, except inasmuch as we gain a formal model that accepts the existence of valid-but-underaligned objects.

  • I take the address of field b and store it in an int* variable?

It’s not undefined behavior to form that pointer. It is, however, still undefined behavior to access the object through that int*, because that type assumes a higher alignment. (The undefined behavior buys us a lot here: otherwise, LLVM would have to assume that all pointers are unaligned unless it could prove that they point to aligned memory. That’s prohibitive.) However, if you don’t access the object as an int*, and instead access it in a less-aligned way, there’s no undefined behavior and the code is guaranteed to work.

For example, given this:
uint32_t *pb = &packet->b;

Under my model, this code would still have undefined behavior and might trap on an alignment-enforcing system:
uint32_t b = *pb;

This code would still have undefined behavior, because the formal type of the access is still uint32_t here:
uint32_t b;
memcpy(&b, pb, sizeof(b));

This code is fine:
uint32_t b;
memcpy(&b, (const char*) pb, sizeof(b));

As is this code:
attribute((aligned(1))) typedef uint32_t unaligned_uint32_t;
uint32_t b = (unaligned_uint32_t) pb;

Note that, under the language standards, both of the last two examples have undefined behavior: there’s no concept of a valid unaligned object at all, and if you shoe-horned one in, it would be probably be undefined behavior to take its address. Clang would be allowed to say “okay, you took the address of this, and we can assume it was actually properly aligned despite being the address of a less-aligned object” and then propagate that alignment assumption to the later accesses to promote the alignment assumption. The goal of my model — and perhaps I’ve mis-formalized it, but I think the goal is quite clear — is just to forswear this capability in the compiler.

I think everything you said above is uncontroversial. I believe everyone here is in agreement that reinterpret_casts back and forth to arbitrary pointer types are currently allowed, and must continue to be, allowed by clang. Richard suggested a spec-diff to formalize that. As an implication of that being valid, the cast itself does not and must not in the future impart any knowledge of alignment (even though doing so would be theoretically okay per the current language spec).

No. A model where reinterpret_casts just “cut” alignment assumptions doesn’t work for me. I am trying to reach a specific result: I want to make it possible for C-level programs to work around alignment problems at the exact point where they blow up / regress if that’s how they choose to fix the issue. Anything that allows the non-local propagation of alignment assumptions without a memory access is a problem.

Furthermore, implementing a model where reinterpret_casts cut alignment assumptions would require huge and invasive changes to the compiler, which naturally wants to propagate alignment assumptions based on value identity. Since cutting techniques require changing value identity, they also regress every single memory and redundancy analysis in the compiler.

That is, for clarity, clang does now and should always continue to allow the following, despite the spec saying it doesn’t need to:

char x = malloc(10);
int y = (int)&x[1]; // assign a misaligned int
.
char c = (char)y; // but accessed only as char*, so no problem.

However, one part of your suggested rules that both I and Richard questioned was the requirement that the expression “&p->n” be valid, even if “p” is misaligned for its type. I still don’t think that it’s necessary or even particularly useful to start allowing that. And, note, that would be an actual change in behavior, not a clarification/formalization of existing behavior.

As far as I’m aware, it’s a change in behavior only for a particular sanitizer tool which gets far, far less coverage than the main compiler. And it’s only considered interesting there because it’s trying to diagnose an issue that’s impossible to diagnose reliably, and alignment at least provides a heuristic to catch some of the problems; although I suspect that it could be restricted to only “accesses” in my sense without adding too many additional false negatives. That is, if you see int x = p->n, you should still be able to assert that p has its required alignment even if p is at an offset, and that probably catches almost all of the interesting cases.

That is:

  1. Is it valid to do “p->n”, when p is not a valid object which is properly aligned for its type?

I reject the assumption that “valid object” necessarily means “properly aligned for its type” independent of an access to memory.

  1. Assuming that’s not valid, does adding a & cause it to then be valid, via some special case? E.g. the rule in C that states that “&a[n]” translates to “(a + n)”, and “&*a” translates to “a”, regardless of the value or validity of “a”. (Without that rule, &*a would be invalid, too, if “a” was null or misaligned.)

Just to reiterate, I think the issue here is not about whether clang can make alignment assumptions in later code, it’s about whether the member access expression itself is valid. (If it’s not valid, then what happens in later code is irrelevant.)

If you can’t make alignment assumptions in later code based on having seen this, I don’t see what flexibility you’re trying to reserve beyond the ability of UBSan to assert.

John.

uint32_t *pb = &packet->b;

Under my model, this code would still have undefined behavior and might trap on an alignment-enforcing system:
uint32_t b = *pb;

We already have a warning for almost this cast, but it’s flawed in two respects:

- It’s noisy, so people normally turn it off.
- It’s silenced by an explicit cast.

The latter is problematic again in your model, because programmers now get no warning when they do the dangerous thing.

This code would still have undefined behavior, because the formal type of the access is still uint32_t here:
uint32_t b;
memcpy(&b, pb, sizeof(b));

This code is fine:
uint32_t b;
memcpy(&b, (const char*) pb, sizeof(b));

This makes me nervous, because presumably void* has an alignment of 1 and now we have different behaviour depending on whether we perform an implicit or explicit cast.

I’m also somewhat uncomfortable with the idea that assigning a uint32_t* temporary to a uint32_t* variable increases its alignment. I’d be tempted to propose modelling the alignment more explicitly in the C type system, so that &pb is not a uint32_t*, it’s a __alignment__(1) uint32_t* and can’t be implicitly cast to a uint32_t*. That would mean that we could explicitly warn on casts that increased the alignment and provide a type for b that would both preserve the (lack of) alignment. For example:

typedef __attribute__((aligned(1))) uint32_t unaligned_uint32_t;

struct foo
{
    char a;
    uint32_t b;
}
__attribute__((packed));

struct foo packet;
...
uint32_t *pb = &packet.b;
unaligned_uint32_t *upb = &packet.b;

Currently, this code is accepted by clang. I would propose that:

// This should not be allowed (or, if it is, with a warning that can be turned into an error)
uint32_t *pb = &packet.b;
// This should be permitted, but the static analyser should complain
uint32_t *pb = (uint32_t*)&packet.b;
// This should be permitted to silently work
unaligned_uint32_t *upb = &packet.b;

I believe that you would get most of this from clang by implicitly providing the alignment information to members of packed structs. This would mean that the type of packet.b would implicitly be unaligned_uint32_t, not uint32_t.

As is this code:
__attribute__((aligned(1))) typedef uint32_t unaligned_uint32_t;
uint32_t b = *(unaligned_uint32_t*) pb;

Note that, under the language standards, both of the last two examples have undefined behavior: there’s no concept of a valid unaligned object at all, and if you shoe-horned one in, it would be probably be undefined behavior to take its address. Clang would be allowed to say “okay, you took the address of this, and we can assume it was actually properly aligned despite being the address of a less-aligned object” and then propagate that alignment assumption to the later accesses to promote the alignment assumption. The goal of my model — and perhaps I’ve mis-formalized it, but I think the goal is quite clear — is just to forswear this capability in the compiler.

This is, as you say, a language extension, but it’s one that’s been supported by GCC since at least the 2.x days and existing code relies on it.

David