Intel Memory Protection Extensions (and types question)

Hi all,

I'm currently adding new instructions and registers to the X86 code
generator for Intel Memory Protection Extensions [1].

A class of special-purpose registers BNDx each holds 2 x 64-bit values.
The components are not individually readable or writable (except by
going through memory) but there are instructions that read only one
of the two elements. The two 64-bit values can be considered opaque,
that is, not useful outside of the specific instructions using this
register class.

After much experimentation, I think it's necessary to model this in
the backend with a new MVT code (ValueTypes.h). Trying to fake it
with an existing type (e.g. v2i64 or i128) leads to these registers
being misused for other values and vice versa.

We want to have intrinsics map to some of these instructions (both
IR and C, in the usual <*intrin.h> form). I'm trying to avoid
having the added MVT escape the code generator by using some other
type representation in IR, but don't have that working yet.

I've put a small patch on Phabricator, recognizing that this is not
committable until there are intrinsics or other means of testing.
http://llvm-reviews.chandlerc.com/D1630

Comments welcomed.

[1] Chapter 9, Intel Architecture Instruction Set Extensions
Programming Reference, July 2013,
http://download-software.intel.com/sites/default/files/319433-015.pdf

Hi Kevin,

Thanks for working on this. We usually try really hard to avoid adding new types such as x86mmx. I don’t know the memory-protection instruction set at all but I imagine that you are not expecting other LLVM optimizations to interact with them right ? (it looks that way from this example[1]). If you are not accessing the individual components then you can use i128, or even <2 x i64>.

Thanks,
Nadav

[1] http://software.intel.com/en-us/blogs/2013/07/22/intel-memory-protection-extensions-intel-mpx-support-in-the-gnu-toolchain

Hi,

Thanks for working on this. We usually try really hard to avoid adding new
types such as x86mmx. I don't know the memory-protection instruction set at
all but I imagine that you are not expecting other LLVM optimizations to
interact with them right ? (it looks that way from this example[1]). If you
are not accessing the individual components then you can use i128, or even <2
x i64>.

We have put some effort into trying to use i128 or v2i64, but it
seems that post instruction selection LLVM is incredibly keen on
putting values of those types in their 'correct' register class
(e.g. XMM) in preference to the BNDx bounds registers. I haven't
found any workaround for that, and adding an MVT code (where there
is already precedent for oddballs like x86mmx and ppc_f128) seems
to be low impact compared to any change to general register
handling.

As well, BNDx register contents do not really match the semantics
of i128 or v2i64 or unfortunately even <2 x i8*>, though the last
is an attractive candidate for an IR representation.

As I mentioned, we do intend to contain this to the backend, and not
introduce a corresponding type to the IR, by ad-hoc handling of the
specific associated intrinsics. I certainly understand that adding
an IR basic type is not something that would be done lightly, but
adding an MVT code amounts to a handful of case labels.

Over time we intend to write bounds checking instrumentation
interoperable with that in gcc and icc; the plan is for this to
be isolated to its own IR pass(es) so that there is no impact on
compilation not using it.

Hi Kevin,

Can you explain what kind of abstraction/support do you plan to implement over the MP instructions ? I imagine that you plan to add a few intrinsics, right ? I imagine that you don’t need the register allocator to allocate the BND registers or anything fancy like that. In that case the registers can be an immediate in the intrinsic. Maybe you can start by presenting the kind of intrinsics that you want to implement.

Thanks,
Nadav

Hi Kevin,

We're also interested in support for fat pointers in LLVM/clang and it would be nice to have some general infrastructure for them (we currently have a load of hacks). There are a lot of research architectures with fat pointers, and MPX is likely to be just the first of many to start hitting real silicon soon. There are a few properties that we'd ideally want to represent in the IR and back ends:

- Pointers are now not solely integers, they contain other metadata
- Fat and thin pointers may coexist in the same program and have different sizes
- The in-memory size of a pointer is not always log2() of its addressable range
- There are some registers that either only store pointers or only store pointer metadata
- Loads and stores of pointers may need to be treated differently to loads and stores of data

I believe that our case and MPX (which is quite close to HardBounds) are close to being opposite end of the spectrum, so it would be nice if we could come up with a generic design that can support both, as it would then simplify life for any future architectures that have this support. In our case:

- Fat pointers are 256 bits
- The metadata is stored alongside the data
- There are special registers and instructions for manipulating pointers.

In the MPX case:

- Fat pointers are 320 bits
- The metadata is stored in separate tables
- There are special instructions for loading metadata from the table into registers
- There are special instructions for loading and storing metadata somewhere explicit
- There are special load / store instructions for

In the IR, we are representing fat pointers as pointers in another address space. Most of the pointers-are-all-the-same-size assumptions in the IR are now fixed, however there are still some pointers-are-integers assumptions, for example GEPs suddenly find their indexes i256 bits, even though the range of the pointer is only 64 bits. This can probably be solved by extending DataLayout to add some extra information about pointers, as was recently done with the work to allow them to be different sizes in different address spaces.

In the back end, you need the register allocator to be able to handle the notion of paired registers. This might be expressed by defining pairs of a GPR + a bounds register as a separate register set that aliases with the GPRs, but I don't think TableGen is quite expressive enough for that.

We also need explicit support for inttoptr and ptrtoint in the back end, as moving between integer and address registers requires explicit conversion for us, and somewhat better matching for pointers in different address spaces in TableGen.

David

Hi Kevin,

We're also interested in support for fat pointers in LLVM/clang and it
would be nice to have some general infrastructure for them (we currently
have a load of hacks). There are a lot of research architectures with fat
pointers, and MPX is likely to be just the first of many to start hitting
real silicon soon. There are a few properties that we'd ideally want to
represent in the IR and back ends:

- Pointers are now not solely integers, they contain other metadata
- Fat and thin pointers may coexist in the same program and have different
sizes
- The in-memory size of a pointer is not always log2() of its addressable
range
- There are some registers that either only store pointers or only store
pointer metadata
- Loads and stores of pointers may need to be treated differently to loads
and stores of data

I believe that our case and MPX (which is quite close to HardBounds) are
close to being opposite end of the spectrum, so it would be nice if we
could come up with a generic design that can support both, as it would then
simplify life for any future architectures that have this support. In our
case:

- Fat pointers are 256 bits
- The metadata is stored alongside the data
- There are special registers and instructions for manipulating pointers.

In the MPX case:

- Fat pointers are 320 bits

How did you come with 320 bits?
320=64*4+64, which is the size of the metadata table entry plus pointer
size, but why do you call this a fat pointer?
In MPX, the fat pointer never exists as a single entity.

--kcc

How did you come with 320 bits?
320=64*4+64, which is the size of the metadata table entry plus pointer size,

Sorry, that should have been 192. The specification allows the metadata to be stored either in look-aside tables or explicitly managed. The tables impose a very large storage space penalty, so are most likely to be used with C or similar language where it is difficult to modify the data layout. For languages where there is no requirement to maintain an ABI that interoperates with non-MPX code, the metadata can be stored inline when running in bounds-checked mode. I forgot that when using it in this mode you needed to store less metadata than when using the bound tables.

but why do you call this a fat pointer?

Because that's what it is: a pointer + metadata

In MPX, the fat pointer never exists as a single entity.

The pointer and metadata exist in separate registers, but single instructions (loads and stores) operate on the pointer + metadata.

David

> How did you come with 320 bits?
> 320=64*4+64, which is the size of the metadata table entry plus pointer
size,

Sorry, that should have been 192. The specification allows the metadata
to be stored either in look-aside tables or explicitly managed.

Is it? Which specification are you referring to?
http://download-software.intel.com/sites/default/files/319433-015.pdf (chapter
9) doesn't say anything like this. (Or does it?)

The tables impose a very large storage space penalty, so are most likely
to be used with C or similar language where it is difficult to modify the
data layout. For languages where there is no requirement to maintain an
ABI that interoperates with non-MPX code, the metadata can be stored inline
when running in bounds-checked mode. I forgot that when using it in this
mode you needed to store less metadata than when using the bound tables.

> but why do you call this a fat pointer?

Because that's what it is: a pointer + metadata

> In MPX, the fat pointer never exists as a single entity.

The pointer and metadata exist in separate registers, but single
instructions (loads and stores) operate on the pointer + metadata.

Which MPX instructions do you mean here?

--kcc

> How did you come with 320 bits?
> 320=64*4+64, which is the size of the metadata table entry plus pointer size,

Sorry, that should have been 192. The specification allows the metadata to be stored either in look-aside tables or explicitly managed.

Is it? Which specification are you referring to?
http://download-software.intel.com/sites/default/files/319433-015.pdf (chapter 9) doesn't say anything like this. (Or does it?)

See the BNDMOV instruction, which allows the bounds to be explicitly loaded and stored to bounds registers. Contrast with BNDLDX / BNDSTX, where the location is implicit. The BNDMOV instruction is also used for stack spills of the bounds registers. This allows MPX to be used for range checking in a similar way to the Thumb-2EE extensions.

The pointer and metadata exist in separate registers, but single instructions (loads and stores) operate on the pointer + metadata.

Which MPX instructions do you mean here?

Ah, sorry, I was confusing MPX with one of the other HardBound-like schemes here. In MPX, you must implicitly insert the BNDCU and BNDCL instructions. I would expect that you'd want to model the BNDCU + BNDCL + MOV sequence as a single pseudo for as long as possible to ensure that the bounds checks were performed at the correct time and not elided, but they are separate instructions (although if they don't do micro-op fusion on the sequence I'd be shocked, since you can trivially do both bounds checks in a single cycle and speculatively enqueue the memory operation with enough time to cancel it if it turned out that the bounds checks should trap).

David

>
>
>
>
> > How did you come with 320 bits?
> > 320=64*4+64, which is the size of the metadata table entry plus
pointer size,
>
>
>
> Sorry, that should have been 192. The specification allows the metadata
to be stored either in look-aside tables or explicitly managed.
>
> Is it? Which specification are you referring to?
> http://download-software.intel.com/sites/default/files/319433-015.pdf(chapter 9) doesn't say anything like this. (Or does it?)

See the BNDMOV instruction, which allows the bounds to be explicitly
loaded and stored to bounds registers. Contrast with BNDLDX / BNDSTX,
where the location is implicit. The BNDMOV instruction is also used for
stack spills of the bounds registers. This allows MPX to be used for range
checking in a similar way to the Thumb-2EE extensions.

Well, ok, you can treat this as a 192-bit fat pointer, but AFAICT this is
not the real intention of the MPX developers
since a fat pointer will break all ABIs, and MPX tries to preserve them.
I don't think we need fat pointers to support MPX in LLVM -- it will
complicate the implementation beyond necessity. (My 2c)
All we need is to represent a 128-bit type that will live in BNDx registers.

--kcc

Well, ok, you can treat this as a 192-bit fat pointer, but AFAICT this is not the real intention of the MPX developers
since a fat pointer will break all ABIs, and MPX tries to preserve them.

MPX is an implementation of the HardBound concept from UPenn, where this was a design goal (see also their 'low-fat pointers' work).

I don't think we need fat pointers to support MPX in LLVM -- it will complicate the implementation beyond necessity. (My 2c)

Fat pointers, however, are required for other architectures (including ours) and it would be nice to use the same general representation for all implementations of bounds-checked pointers (whether you call them fat pointers or not).

All we need is to represent a 128-bit type that will live in BNDx registers.

Only if you want to push all of the work into the front end.

David

> Well, ok, you can treat this as a 192-bit fat pointer, but AFAICT this
is not the real intention of the MPX developers
> since a fat pointer will break all ABIs, and MPX tries to preserve them.

MPX is an implementation of the HardBound concept from UPenn, where this
was a design goal (see also their 'low-fat pointers' work).

This one? http://acg.cis.upenn.edu/papers/asplos08_hardbound.pdf
I didn't know.

> I don't think we need fat pointers to support MPX in LLVM -- it will
complicate the implementation beyond necessity. (My 2c)

Fat pointers, however, are required for other architectures (including
ours) and it would be nice to use the same general representation for all
implementations of bounds-checked pointers (whether you call them fat
pointers or not).

It may be nice to have fat pointers, but this is unrelated to MPX as an
instruction set extension.
Consider, for example, possible uses of MPX not directly related to bound
checking: e.g. implementing a software sandbox.
In this case you need intrinsics to get/set BNDx registers and to call
BNDCU/BNDCL, but you don't need fat pointers at all.

--kcc

Hi Nadav,

Can you explain what kind of abstraction/support do you plan to implement
over the MP instructions ? I imagine that you plan to add a few intrinsics,
right ? I imagine that you don't need the register allocator to allocate the
BND registers or anything fancy like that.

We do need register allocation; the bounds registers are a set of 4,
and there is no user-level (C intrinsic) access to specific registers,
and certainly no desire to reinvent the wheel with some kind of
ad-hoc allocation.

At the low level, I envision core LLVM intrinsics something like this
(details TBD; attributes elided for simplicity):

    ; This type represents a pointer bounds pair. (We want to avoid
    ; allowing IR that lets the individuals bounds 'escape' without
    ; using an @llvm.mpx intrinsic.)
    llvm.x86.mpx.bounds = type opaque

    ; Build a set of bounds from a base address and size.
    declare %llvm.x86.mpx.bounds @llvm.x86.mpx.mk(i8* %p, i64 %size)

    ; Verify bounds. These take a plain pointer and return a
    ; (bitwise identical) pointer painted green.
    declare i8* @llvm.x86.mpx.cl(%llvm.x86.mpx.bounds, i8*)
    declare i8* @llvm.x86.mpx.cu(%llvm.x86.mpx.bounds, i8*)

    ; Store and loads bounds 'elsewhere'; these allow transferring
    ; bounds across ABI boundaries without affecting existing data
    ; layouts.
    declare void @llvm.x86.mpx.sx(%llvm.x86.mpx.bounds, i8*, i8**)
    declare %llvm.x86.mpx.bounds @llvm.x86.mpx.lx(i8**)

The above correspond closely to MPX instructions (and would also have
obvious non-MPX instruction sequences or software implementations,
should anyone want to apply the model elsewhere).

The normal MPX use case has the above intrinsics inserted at pointer
definitions and uses by an MPX analysis pass (completely optional, but
ideally drawing on existing work wherever practical). We'd expect
generic optimizations to eliminate redundant checks, and improve those
generic optimizations if necessary.

GCC and ICC have a set of C-level intrinsics, described in the GCC
notes[1], that operate on 'painted' pointers (using that terminology
to distinguish them from purely fat pointers, since MPX manages the
'fat' separately). These are intended primarily for use in low-level
memory management libraries, where the compiler can't determine itself
that a specific bounded memory region is being defined. We currently
envisage lowering these to the above LLVM intrinsics are the start of
the optional MPX pass.

[1] http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler#Compiler_intrinsics_and_attributes

Regards,

Hi David,

[...]
I believe that our case and MPX (which is quite close to HardBounds) are
close to being opposite end of the spectrum, so it would be nice if we could
come up with a generic design that can support both [...]

I'm sure we'd be interested in participating in this discussion,
and migrating MPX support to use any infrastructure that comes out
of it. I personally favor the St Exupery view of engineering elegance
("perfection is finally attained not when there is no longer anything
to add, but when there is no longer anything to take away") and would
support any simplifying unification.

In the short term, though, since MPX has its pointer fat liposuctioned
and stored in ziploc bags, we can manage quite well without new
infrastructure, and our short term goals are
- being able to generate the machine instructions
- supporting the MPX 'standard model' interoperably with gcc and icc
Pragmatically, it seems that this is most likely to be acceptable to
the LLVM community if the impact is essentially zero outside of where
it's absolutely necessary (X86 code generator and optional MPX pass).

[...]
See the BNDMOV instruction, which allows the bounds to be explicitly loaded
and stored to bounds registers. Contrast with BNDLDX / BNDSTX, where the
location is implicit. The BNDMOV instruction is also used for stack spills
of the bounds registers. This allows MPX to be used for range checking in a
similar way to the Thumb-2EE extensions.

And similar to the x86 BOUND instruction (80186 forward IIRC) with
the need for the ABI to accommodate passing bounds. Although BNDLDX /
BNDSTX / BNDMOV can be used in this fashion in a system with a new ABI,
that will probably happen just about as often as BOUND actually gets
used, and the meat of the MPX model lies in supporting C/C++-oriented
systems transparently to code using the established ABI.

[...]
I would expect that you'd want to model the BNDCU + BNDCL + MOV sequence as a
single pseudo for as long as possible to ensure that the bounds checks were
performed at the correct time and not elided

Actually, we'd like to checks and loads/stores to be split and elided
as much as possible, subject to data dependencies determining the
'correct time' - the canonical example being a loop whose range is
dynamically known at the start. Maybe I'm misunderstanding you.

[...]
MPX is an implementation of the HardBound concept from UPenn, where this was
a design goal (see also their 'low-fat pointers' work).

There has been some interesting discussion on comp.arch relating to
the background of MPX, which (however fascinating I find the history of
capability architectures) I am not willing to join here or elsewhere;
I've only been personally aware of MPX myself for a matter of months,
and besides, the law-talking guys would probably slap me silly.