Alignments in LLVM IR

Hello,

I am currently writing my Master's Thesis on a topic regarding the analysis of memory safety and termination of LLVM programs. This includes alignments in LLVM IR, but I am not sure if I understand their semantics correctly. I have written a program (see attachment) which uses the instruction

  store i32 1, i32* %7, align 4

to store an integer at an address that I forced to be uneven, and compiled it with clang. The result is that the integer is stored exactly there, which I expected for alignment 1 but not for alignment 4. Changing the alignment to any other size does not have any effect.

This leads to my questions:

- Do alignments provide additional semantics to be obeyed by the compiler or are they just hints that can be ignored?
- What is the semantics of alignments from the perspective of an IR analyzer as opposed to an IR emitter?
- Can you give me an example where wrong alignments lead to undefined behavior?

Kind regards,
Jera

test1.c (206 Bytes)

test1.ll (2.03 KB)

Hello,

I am currently writing my Master's Thesis on a topic regarding the analysis of memory safety and termination of LLVM programs.

Cool. Have you checked out the Memory Safety Menagerie? http://sva.cs.illinois.edu/menagerie/

This includes alignments in LLVM IR, but I am not sure if I understand their semantics correctly. I have written a program (see attachment) which uses the instruction

  store i32 1, i32* %7, align 4

to store an integer at an address that I forced to be uneven, and compiled it with clang. The result is that the integer is stored exactly there, which I expected for alignment 1 but not for alignment 4. Changing the alignment to any other size does not have any effect.

This leads to my questions:

- Do alignments provide additional semantics to be obeyed by the compiler or are they just hints that can be ignored?

According to the language reference manual, having a dynamic pointer value that isn't aligned at the specified alignment is undefined behavior. The alignment is designed as a hint to the code generator that it can assume that the address will be at the specified alignment so that it can generate more efficient code. This is useful on processors that have different memory access instructions for different word sizes (I think ARM is an example; I am sure there are others).

- What is the semantics of alignments from the perspective of an IR analyzer as opposed to an IR emitter?

From your perspective, there's undefined behavior if the address in the load or store isn't aligned at the proper boundary. That said, you can probably cheat a little bit and look at what the code generator will do. On x86, for example, the alignment probably doesn't matter as nearly all x86 memory access instructions can access any alignment; the memory access may simply be slower than necessary.

- Can you give me an example where wrong alignments lead to undefined behavior?

On some processors, there are different memory access instructions for accessing memory of difference sizes and alignments. Using an address that isn't aligned properly would cause a fault. I think ARM does this.

Regards,

John Criswell

- Do alignments provide additional semantics to be obeyed by the compiler or are they just hints that can be ignored?

As John said, hints. You should be able to replace all alignments by
"align 1" without changing the output (assuming the code still
compiles).

- What is the semantics of alignments from the perspective of an IR analyzer as opposed to an IR emitter?
- Can you give me an example where wrong alignments lead to undefined behavior?

Well, at runtime some CPUs can be configured to raise an exception if
the access is unaligned. That's pretty undefined.

At compile time, I think we use alignments to convert "add %addr, 3"
into "or %addr, 3" (and so on) in some cases. That wouldn't go well if
the alignment was incorrect.

There are probably other transformations too, any optimisation making
use of that information is a candidate.

Cheers.

Tim.

Depends on where the alignment is used. For stores [1] it's a hint, as
John and Tim said. For global variables [2] it's an additional semantic.

[1]: http://llvm.org/docs/LangRef.html#store-instruction
[2]: http://llvm.org/docs/LangRef.html#global-variables

'align' on stores is not a hint: "Overestimating the alignment results in
undefined behavior."

Unaligned stores can trap on some ISAs (not many that matter), but I
believe we also try to use this information to optimize %b to zero here:
store i32* %p, i32 0, align 4
%i = ptrtoint i32* %p to i64
%b = and i64 %i, i64 3