proposed new rule for getelementptr

Hello,

I'm working on refining the definition of getelementptr (GEP) to
clarify its semantics and to allow front-ends to provide additional
information to optimization passes.

To help support this, I'd like to propose the following rule:

Any memory access must be done though a pointer value associated
with with address range of the memory access, otherwise the behavior
is undefined.

"associated with" is defined as follows:

  - A pointer value formed from a getelementptr instruction is
    associated with the addresses associated with the first operand of
    the getelementptr.
  - An addresses of a global variable is associated with the address
    range of the variable's storage.
  - The result value of an allocation instruction is associated with
    the address range of the allocated storage.
  - A null pointer is associated with no addresses.
  - A pointer value formed by a ptrtoint is associated with all address
    ranges of all pointer values that contribute (directly or
    indirectly) to the computation of the pointer's value.
  - An integer value other than zero may be associated with address
    ranges allocated through mechanisms other than those provided by
    LLVM; such ranges shall not overlap with any ranges of address
    allocated by mechanisms provided by LLVM.

For example, in an instruction like this:

  %p = getelementptr [4 x i8]* @A, i32 0, i32 %huge

if %huge is beyond the size of @A, %p could potentially point into
some object other than @A. This rule says that it's not valid to use
%p to access that other object. %p would only be usable for memory
accesses when it points within @A.

C and C-like front-ends already obey this rule, since in C pointer
arithmetic results must remain within an allocated object (or one
past the end), which is more strict.

Front-ends that do crazy things to pointers may need to use
ptrtoint+arithmetic+inttoptr instead of getelementptr. If the
target-independent properties of getelementptr are needed, the
"getelementptr null" trick may be useful.

With this rule, BasicAliasAnalysis and similar analyses that depend
on being able to track a pointer value up to its base value will be
able to safely climb through getelementptr definitions without
needing any further guarantees.

That means that this rule will eliminate the most severe need for
having getelementptr overflow produce undefined results. This will
make it practical to have an undefined-on-overflow flag for
getelementptr that can be meaningfully set to either true or false.

A non-overflowing option for getelementptr would be useful to allow
SCEVExpander and other passes to convert code like this:
   %a = mul %x, 4
   %b = add %y, %a
   %c = getelementptr [4 x i8]* @Object, 0, %b

into this:
   %c = getelementptr [4 x i8]* @Object, %x, %y

which wouldn't otherwise be safe because (getelementptr @Object, %x)
by itself might overflow. (From a low-level perspective this isn't
much of an optimization, but from a high-level perspective it's
easier for non-SCEV-based passes to analyze.)

Comments and suggestions are welcome,

Dan

  - A pointer value formed by a ptrtoint is associated with all address
    ranges of all pointer values that contribute (directly or
    indirectly) to the computation of the pointer's value.

Do you mean inttoptr here? And that "all pointer values that contribute
(directly or indirectly) to the computation of the pointer's value" could
equally mean "all pointer values that contribute (directly or indirectly) to
the computation of the integer's value?"

  - An integer value other than zero may be associated with address
    ranges allocated through mechanisms other than those provided by
    LLVM; such ranges shall not overlap with any ranges of address
    allocated by mechanisms provided by LLVM.

What's the intent here? Are you basically saying that memory regions accessed
through random integer arithmetic can't overlap memory regions allocated by
alloca, malloc, etc.? If so, my sense is that's too demanding. In general
one does not know the aliasing properties of addresses computed entirely by
integer arithmetic. It could be some misguided user trying to "help" the
compiler by explicitly linearizing addressing or it could be an embedded
developer computing a memory-mapped I/O address. There's just no way to
know.

Front-ends that do crazy things to pointers may need to use
ptrtoint+arithmetic+inttoptr instead of getelementptr. If the
target-independent properties of getelementptr are needed, the
"getelementptr null" trick may be useful.

To clarify, the ptrtoint+arithmetic+inttoptr would still be legal?

                           -Dave

- The result value of an allocation instruction is associated with
the address range of the allocated storage.

Might want to clarify this to include calloc/realloc.

- A null pointer is associated with no addresses.

A null pointer in address space 0.

- A pointer value formed by a ptrtoint is associated with all address
ranges of all pointer values that contribute (directly or
indirectly) to the computation of the pointer's value.

I assume you mean inttoptr?

A non-overflowing option for getelementptr would be useful to allow
SCEVExpander and other passes to convert code like this:
%a = mul %x, 4
%b = add %y, %a
%c = getelementptr [4 x i8]* @Object, 0, %b

into this:
%c = getelementptr [4 x i8]* @Object, %x, %y

So a "defined-overflow" gep would act like wrapping integer arithmetic
in the width of the pointer? Then what exactly is the definition of
an "undefined-overflow" gep? Here's a first shot at a definition:
"Considering a GEP as a series of calculations, one for each index, if
the result that would be obtained by performing exact arithmetic
(treating the index as a signed integer) is outside the bounds of the
address range of the base pointer, the result is undefined."

-Eli

- A pointer value formed by a ptrtoint is associated with all address

   ranges of all pointer values that contribute (directly or

   indirectly) to the computation of the pointer's value.

Do you mean inttoptr here? And that "all pointer values that contribute
(directly or indirectly) to the computation of the pointer's value" could
equally mean "all pointer values that contribute (directly or indirectly) to
the computation of the integer's value?"

Yes, I meant inttoptr instead of ptrtoint here.

- An integer value other than zero may be associated with address

   ranges allocated through mechanisms other than those provided by

   LLVM; such ranges shall not overlap with any ranges of address

   allocated by mechanisms provided by LLVM.

What's the intent here? Are you basically saying that memory regions accessed
through random integer arithmetic can't overlap memory regions allocated by
alloca, malloc, etc.? If so, my sense is that's too demanding. In general
one does not know the aliasing properties of addresses computed entirely by
integer arithmetic. It could be some misguided user trying to "help" the
compiler by explicitly linearizing addressing or it could be an embedded
developer computing a memory-mapped I/O address. There's just no way to
know.

The intent is to cover code like this:

     %p = alloca i32
     %q = inttoptr i64 0123456789 to i32*

Here, %p and %q are assumed to be non-aliasing. This kind of thing
is used by some JITs; they allocate objects via custom mechanisms
and then tell LLVM IR about the address of the objects via magic
integer constants like this. The optimizer is already making this
assumption today, though implicitly.

If the user hand-linearizes addressing using casts to integers,
it will still be supported -- that's what the broad language for
inttoptr above is intended to cover.

Front-ends that do crazy things to pointers may need to use

ptrtoint+arithmetic+inttoptr instead of getelementptr. If the

target-independent properties of getelementptr are needed, the

"getelementptr null" trick may be useful.

To clarify, the ptrtoint+arithmetic+inttoptr would still be legal?

Yes, subject to constraints in the bullet point quoted at the
top of this email, which effectively just spelling out rules that
are already assumed today.

Dan

- An integer value other than zero may be associated with address
ranges allocated through mechanisms other than those provided by
LLVM; such ranges shall not overlap with any ranges of address
allocated by mechanisms provided by LLVM.

What's the intent here? Are you basically saying that memory regions accessed
through random integer arithmetic can't overlap memory regions allocated by
alloca, malloc, etc.?

Yes, I think that's the intent.

If so, my sense is that's too demanding. In general
one does not know the aliasing properties of addresses computed entirely by
integer arithmetic. It could be some misguided user trying to "help" the
compiler by explicitly linearizing addressing or it could be an embedded
developer computing a memory-mapped I/O address. There's just no way to
know.

If a memory-mapped I/O address aliases the stack, there's something
seriously messed up going on... of course, we wouldn't assume anything
about whether 1000 and 2000 alias.

Front-ends that do crazy things to pointers may need to use
ptrtoint+arithmetic+inttoptr instead of getelementptr. If the
target-independent properties of getelementptr are needed, the
"getelementptr null" trick may be useful.

To clarify, the ptrtoint+arithmetic+inttoptr would still be legal?

Yes, I think that is the intent.

-Eli

The intent is to cover code like this:

     %p = alloca i32
     %q = inttoptr i64 0123456789 to i32*

Here, %p and %q are assumed to be non-aliasing. This kind of thing
is used by some JITs; they allocate objects via custom mechanisms
and then tell LLVM IR about the address of the objects via magic
integer constants like this. The optimizer is already making this
assumption today, though implicitly.

But how do you know 0123456789 doesn't point to the same thing %p does?
I understand it's highly unlikely and that we'd like to define that
there's no aliasing. I just get a little squeemish, especially when
considering embedded devices.

If the user hand-linearizes addressing using casts to integers,
it will still be supported -- that's what the broad language for
inttoptr above is intended to cover.

Right. I was thinking more along the lines of the user "knowing" where
some global is allocated and computing that address directly, without
using pointers at all.

> To clarify, the ptrtoint+arithmetic+inttoptr would still be legal?

Yes, subject to constraints in the bullet point quoted at the
top of this email, which effectively just spelling out rules that
are already assumed today.

Those rules are ok, I think. It's the computed-integer problem that
is gnawing at me.

I *think* the C standard says there can't be an alias and I assume the
Fortran standard does as well. But even so, we all know that users
often ignore standards when convenient.

                               -Dave

Hello,

I'm working on refining the definition of getelementptr (GEP) to
clarify its semantics and to allow front-ends to provide additional
information to optimization passes.

To help support this, I'd like to propose the following rule:

Any memory access must be done though a pointer value associated
with with address range of the memory access, otherwise the behavior
is undefined.
  
If I compute an out-of-bounds GEPs and use it in ICMP is that undefined
behaviour, or wrapping behaviour?

Instcombine currently does this transform ((gep Ptr, OFFSET) cmp Ptr)
---> (OFFSET cmp 0), which is perfectly
fine when GEP is assumed to have undefined behavior on overflow, but is
wrong if GEP is allowed to overflow
with well defined behavior (it wraps).

Also will can the behavior of GEP be changed depending on llvm-gcc's
-fno-strict-aliasing/-fwrapv/-fno-strict-pointer-overflow flags?

Front-ends that do crazy things to pointers may need to use
ptrtoint+arithmetic+inttoptr instead of getelementptr. If the
target-independent properties of getelementptr are needed, the
"getelementptr null" trick may be useful.

Will various optimizations make sure they don't introduce undefined
operations in programs with otherwise well defined semantics?
I think instcombine may turn them into GEPs, thus transforming a program
with well defined semantics according to above rules into
undefined semantics. Of course if the optimizer can see that the access
is always well defined it should do this transform.

Best regards,
--Edwin

- The result value of an allocation instruction is associated with

   the address range of the allocated storage.

Might want to clarify this to include calloc/realloc.

These sort of fall under the category of allocations done
outside of LLVM IR, but it'd be better to address this
case explicitly. I'll make a note of that.

- A null pointer is associated with no addresses.

A null pointer in address space 0.

I'm not fond of weird address-space semantics, but for consistency
with what the optimizer is currently doing, you're right.

- A pointer value formed by a ptrtoint is associated with all address

   ranges of all pointer values that contribute (directly or

   indirectly) to the computation of the pointer's value.

I assume you mean inttoptr?

Yes.

A non-overflowing option for getelementptr would be useful to allow

SCEVExpander and other passes to convert code like this:

  %a = mul %x, 4

  %b = add %y, %a

  %c = getelementptr [4 x i8]* @Object, 0, %b

into this:

  %c = getelementptr [4 x i8]* @Object, %x, %y

So a "defined-overflow" gep would act like wrapping integer arithmetic
in the width of the pointer? Then what exactly is the definition of
an "undefined-overflow" gep? Here's a first shot at a definition:
"Considering a GEP as a series of calculations, one for each index, if
the result that would be obtained by performing exact arithmetic
(treating the index as a signed integer) is outside the bounds of the
address range of the base pointer, the result is undefined."

Yes, something like this. I had been thinking of defining gep overflow
in terms of integer overflow, and then just saying that since LLVM IR
doesn't usually know what the address is, the only way to be safe is to
stay within the object, but your suggestion might be simpler.

Also, for completeness, the conversion of indices to pointer-sized
integers shouldn't change their value, the multiplication of the
indices by array element sizes shouldn't overflow, and computing an
address one-past-the-end is ok.

Dan

Hello,

I'm working on refining the definition of getelementptr (GEP) to

clarify its semantics and to allow front-ends to provide additional

information to optimization passes.

To help support this, I'd like to propose the following rule:

Any memory access must be done though a pointer value associated

with with address range of the memory access, otherwise the behavior

is undefined.

If I compute an out-of-bounds GEPs and use it in ICMP is that undefined
behaviour, or wrapping behaviour?

This will be addressed when the wrapping option for GEP is available.
A wrapping GEP will provide wrapping behavior, and a non-wrapping
GEP will have an undefined result on overflow. So it's up to the
front-end to decide which one it wants.

Instcombine currently does this transform ((gep Ptr, OFFSET) cmp Ptr)
---> (OFFSET cmp 0), which is perfectly
fine when GEP is assumed to have undefined behavior on overflow, but is
wrong if GEP is allowed to overflow
with well defined behavior (it wraps).

Also will can the behavior of GEP be changed depending on llvm-gcc's
-fno-strict-aliasing/-fwrapv/-fno-strict-pointer-overflow flags?

Yes. To implement -fwrapv, the front-end can omit any of the no-overflow
flags, and choose the wrapping form of GEP, as appropriate.

I'm not familiar with -fno-strict-pointer-overflow. Do you mean
-fno-strict-overflow? I guess for LLVM that would be equivalent
to -fwrapv.

-fno-strict-aliasing is not impacted.

Front-ends that do crazy things to pointers may need to use

ptrtoint+arithmetic+inttoptr instead of getelementptr. If the

target-independent properties of getelementptr are needed, the

"getelementptr null" trick may be useful.

Will various optimizations make sure they don't introduce undefined
operations in programs with otherwise well defined semantics?
I think instcombine may turn them into GEPs, thus transforming a program
with well defined semantics according to above rules into
undefined semantics. Of course if the optimizer can see that the access
is always well defined it should do this transform.

Yes. One of the goals of this effort is to legitimize some of what the
optimizer is already doing, and another is to make some of the things
it's doing dependent on explicit permission in the form of no-overflow
flags.

Dan

Hello,

I'm working on refining the definition of getelementptr (GEP) to

clarify its semantics and to allow front-ends to provide additional

information to optimization passes.

To help support this, I'd like to propose the following rule:

Any memory access must be done though a pointer value associated

with with address range of the memory access, otherwise the behavior

is undefined.

If I compute an out-of-bounds GEPs and use it in ICMP is that
undefined
behaviour, or wrapping behaviour?
    
This will be addressed when the wrapping option for GEP is available.
A wrapping GEP will provide wrapping behavior, and a non-wrapping
GEP will have an undefined result on overflow. So it's up to the
front-end to decide which one it wants.

Instcombine currently does this transform ((gep Ptr, OFFSET) cmp Ptr)
---> (OFFSET cmp 0), which is perfectly
fine when GEP is assumed to have undefined behavior on overflow, but
is
wrong if GEP is allowed to overflow
with well defined behavior (it wraps).

Also will can the behavior of GEP be changed depending on llvm-gcc's
-fno-strict-aliasing/-fwrapv/-fno-strict-pointer-overflow flags?
    
Yes. To implement -fwrapv, the front-end can omit any of the no-overflow
flags, and choose the wrapping form of GEP, as appropriate.

I'm not familiar with -fno-strict-pointer-overflow. Do you mean
-fno-strict-overflow? I guess for LLVM that would be equivalent
to -fwrapv.

Yes, I mean -fno-strict-overflow.
Since LLVM has defined semantics for signed overflow, -fwrapv doesn't
add anything wrt. -fno-strict-overflow, yes.

-fno-strict-aliasing is not impacted.
  
Will LLVM's default behavior be that of -fstrict-aliasing, or
-fno-strict-aliasing?
We don't have TBAA, but the "computer pointer doesn't alias existing
pointers" sounds like something for -fno-strict-aliasing, no?
I'm not sure what gcc does here, does -fno-strict-aliasing have any
effect on pointer computed from ints?

Front-ends that do crazy things to pointers may need to use

ptrtoint+arithmetic+inttoptr instead of getelementptr. If the

target-independent properties of getelementptr are needed, the

"getelementptr null" trick may be useful.

Will various optimizations make sure they don't introduce undefined
operations in programs with otherwise well defined semantics?
I think instcombine may turn them into GEPs, thus transforming a
program
with well defined semantics according to above rules into
undefined semantics. Of course if the optimizer can see that the
access
is always well defined it should do this transform.
    
Yes. One of the goals of this effort is to legitimize some of what the
optimizer is already doing, and another is to make some of the things
it's doing dependent on explicit permission in the form of no-overflow
flags.
  
I think this is great. Frontends that want always well-defined behavior
can just set the flags appropriately.

Best regards,
--Edwin

What happens when you allocate more room for a variable than its type
says it needs,
and then access the variable beyond the limit of what its type would
allow (but is otherwise valid, because you did allocate enough bytes)?

I assume that the intention is to treat these as well defined accesses
(pointer is within address range), is that right?

I'm thinking of flexible array members in structs, and things like:
struct foo
{
    ...
    char var[1];
};

struct foo *x = malloc(sizeof(*x) + len);
x->var[len-1];

Best regards,
--Edwin

It does sound like it from the name, but no, it's unrelated.

Dan

Yep, this is fine. The types in a getelementptr are just used to
guide offset computation; they don't say anything about how the
underlying memory was allocated.

Dan

The intent is to cover code like this:

    %p = alloca i32

    %q = inttoptr i64 0123456789 to i32*

Here, %p and %q are assumed to be non-aliasing. This kind of thing

is used by some JITs; they allocate objects via custom mechanisms

and then tell LLVM IR about the address of the objects via magic

integer constants like this. The optimizer is already making this

assumption today, though implicitly.

But how do you know 0123456789 doesn't point to the same thing %p does?
I understand it's highly unlikely and that we'd like to define that
there's no aliasing. I just get a little squeemish, especially when
considering embedded devices.

Without this assumption, much of LLVM would be useless. It's
more practical to make LLVM entirely useless to obscure
hypothetical users than to make it mostly useless to most
current real users.

Those rules are ok, I think. It's the computed-integer problem that
is gnawing at me.

I *think* the C standard says there can't be an alias and I assume the
Fortran standard does as well. But even so, we all know that users
often ignore standards when convenient.

If anyone comes forward with a real problem that this is causing,
we'll consider the problem in context, where there may be alternative
solutions.

Dan

FWIW, this will be addressed when we finally get around to adding explicit support for trappability to load/store.

-Chris

This won't free the optimizer to treat null as an invalid pointer though.
Some LLVM hacker thought it would be a good idea to use address
spaces 256 and 257 on x86 in a way that can require dereferences
of address 0 to be valid.

Dan

Hi,

Would it be possible to make the following slight change to ExecutionEngine::create()?

I would like to be able to disable the automatic enumeration of loaded modules, and the subsequent searching for undefined symbols (using SearchForAddressOfSymbol).

I propose adding an extra parameter to the function definition, defaulting to true.

  static ExecutionEngine *create(ModuleProvider *MP,
                                 bool ForceInterpreter = false,
                                 std::string *ErrorStr = 0,
                                 bool Fast = false,
                                 bool EnumModules = true);

And corresponding change to the function.

ExecutionEngine *ExecutionEngine::create(ModuleProvider *MP,
                                         bool ForceInterpreter,
                                         std::string *ErrorStr,
                                         bool Fast,
                                         bool EnumModules) {
  ExecutionEngine *EE = 0;
  if (EnumModules) {
    // Make sure we can resolve symbols in the program as well. The zero arg
    // to the function tells DynamicLibrary to load the program, not a library.
    if (sys::DynamicLibrary::LoadLibraryPermanently(0, ErrorStr))
      return 0;
  }
  ...

Your help would be appreciated.

Rob.

Hi,

Would it be possible to make the following slight change to ExecutionEngine::create()?

I would like to be able to disable the automatic enumeration of loaded modules, and the subsequent searching for undefined symbols (using SearchForAddressOfSymbol).

I propose adding an extra parameter to the function definition, defaulting to true.

I'd rather not. The current API is messy enough already. Is there not another way to solve the problem?

Evan

Hi Rob,

Can you comment on exactly what the problem is you want to solve? Is
it a performance issue with LoadLibraryPermanently, or do you simply
not want the external symbols to be resolved from within the JIT?

- Daniel

Hi Dan,

What you are proposing is a major change in the semantics of llvm.

You are taking certain uses of an instruction that have well defined behaviour and undefining them.

Have you made any estimate of how many peoples' code this may or may not break?

I think this is a *very* bad idea.

Let me make some more detailed comments:

Dan Gohman wrote:

Hello,

I'm working on refining the definition of getelementptr (GEP) to
clarify its semantics and to allow front-ends to provide additional
information to optimization passes.

To help support this, I'd like to propose the following rule:

Any memory access must be done though a pointer value associated
with with address range of the memory access, otherwise the behavior
is undefined.

"associated with" is defined as follows:

  - A pointer value formed from a getelementptr instruction is
    associated with the addresses associated with the first operand of
    the getelementptr.
  - An addresses of a global variable is associated with the address
    range of the variable's storage.
  - The result value of an allocation instruction is associated with
    the address range of the allocated storage.
  - A null pointer is associated with no addresses.
  - A pointer value formed by a ptrtoint is associated with all address
    ranges of all pointer values that contribute (directly or
    indirectly) to the computation of the pointer's value.
  - An integer value other than zero may be associated with address
    ranges allocated through mechanisms other than those provided by
    LLVM; such ranges shall not overlap with any ranges of address
    allocated by mechanisms provided by LLVM.

I notice no mention of bitcasts on pointers here.

Intermediate representations are often full of weird-looking, but correct, pointer arithmetic.

For example, in an instruction like this:

  %p = getelementptr [4 x i8]* @A, i32 0, i32 %huge

if %huge is beyond the size of @A, %p could potentially point into
some object other than @A. This rule says that it's not valid to use
%p to access that other object. %p would only be usable for memory
accesses when it points within @A.

C and C-like front-ends already obey this rule, since in C pointer
arithmetic results must remain within an allocated object (or one
past the end), which is more strict.

Lots of C code passes pointers around which might point to the middle of an array, say for searching. That means that negative indices could still remain within an allocated object.

Front-ends that do crazy things to pointers may need to use
ptrtoint+arithmetic+inttoptr instead of getelementptr. If the
target-independent properties of getelementptr are needed, the
"getelementptr null" trick may be useful.

Who says pointer arithmetic is crazy?
I create llvm code from an IR that has 2 types of pointers, heap-pointers and non-heap pointers, and is safe for GC, but prevents alias analysis.
So, I don't use any alias analysis stuff when doing optimisation, no big deal.
Suddenly, my correct and working code will become "undefined" :frowning:

With this rule, BasicAliasAnalysis and similar analyses that depend
on being able to track a pointer value up to its base value will be
able to safely climb through getelementptr definitions without
needing any further guarantees.

So all this breakage is for a few optimisation passes, that I don't even use?

That means that this rule will eliminate the most severe need for
having getelementptr overflow produce undefined results. This will
make it practical to have an undefined-on-overflow flag for
getelementptr that can be meaningfully set to either true or false.

A non-overflowing option for getelementptr would be useful to allow
SCEVExpander and other passes to convert code like this:
   %a = mul %x, 4
   %b = add %y, %a
   %c = getelementptr [4 x i8]* @Object, 0, %b

into this:
   %c = getelementptr [4 x i8]* @Object, %x, %y

which wouldn't otherwise be safe because (getelementptr @Object, %x)
by itself might overflow. (From a low-level perspective this isn't
much of an optimization, but from a high-level perspective it's
easier for non-SCEV-based passes to analyze.)

Comments and suggestions are welcome,

Please don't do this!

My suggestion is this:

Add a "strict-GEP", which does what you suggest. This would allow front-ends to tell the back-end about what things cannot be aliased,
and not cause any breakages for code that uses the "old" GEP.

Cheers,
Mark.