instruction/intrinsic for segmented adressing

Hi,

would like to use LLVM as backend for a compiler. One of the features I would like to implement is segment based addressing for position independent data. For some it may sound strange, for others the opposite.

No need to write complex story. Imagine you have a custom alocator that manages an area of 1GB of memory. Your application uses a custom allocator to allocate memory inside this area, and at the end of your code you save one to one that memory to disk. Next time you load that one 1GB where you can (addresses that are available) and by using segment based addressing all pointers inside would be valid independently where you loaded the 1GB.

Now on x64 I have GS/FS registers. Pitty enough their addresses can be changed only by the OS (not in user space). Not sure what “tools” are available on ARM, hopefully there is something.

New my question is, what is the best way to tell LLVM to generate [FS:xxx] and/or [GS:xxxx] class of instructions?

thanks

Hi,

Now on x64 I have GS/FS registers. Pitty enough their addresses can be
changed only by the OS (not in user space). Not sure what "tools" are
available on ARM, hopefully there is something.

There are no segment registers on ARM. AArch64 has a couple of thread
pointer registers that might be abused for the purpose (one even
writable from user-space). AArch32 only has one, I believe, which is
usually claimed by the OS for threads.

New my question is, what is the best way to tell LLVM to generate [FS:xxx]
and/or [GS:xxxx] class of instructions?

On x86, the addrspace(N) property of pointers triggers use of segment
registers (256 => gs, 257 => fs by the looks of it). E.g.

define i32 @foo(i32 addrspace(256)* %addr) {
  %val = load i32 addrspace(256)* %addr
  ret i32 %val
}

But as in the ARM case, this is usually the mechanism used for
thread-local storage, so be careful.

Cheers.

Tim.

Indirectly related. Just discovered that clang has a non-standard attribute, but that is X86-64 specific. Would be interesting to find a cross-platform solution and implement it both in clang and llvm…

the attribute in question is translated to addrspace(256)…

http://clang.llvm.org/docs/LanguageExtensions.html#non-standard-c-11-attributes

Not so disappointing the information you are telling me about AArch64. You
get the point, do not care if they are called segment registers, or
MyProposeRegisters, important is that I have them there for the duration of
a fiber. Do you know, are they exhaustively used by linux for instance?

There are two of them (readable from user space): TPIDR_EL0 and
TPIDRRO_EL0. The first is exclusively claimed by Linux for TLS, the
second is unused (as far as I know) but only writable by the kernel
(RO == read-only).

Incidentally, there's no way to directly control either of these in LLVM.

What about OSX? Anybody has any idea?

OS X uses a different system for TLS at the moment, but almost
certainly reserves the right to do whatever it pleases with the
segment registers at a future date.

In terms of CPU time, what would be the overhead of using such "segmented"-addressing? Myself I assume almost zero. CPU cache related issues would probably not change or?

Probably fairly minimal in most cases (on x86). On ARM there is
definitely a cost.

Cheers.

Tim.

(Adding llvmdev to CC again)

Probably fairly minimal in most cases (on x86). On ARM there is
definitely a cost.

hm... why? You cannot have indexed addressing?

The code that needs to be emitted is roughly:
    [..."segment"-offset into x1...]
    mrs x0, tpidr_el0
    ldr xD, [x0, x1]

That's a more complex addressing mode and an additional MRS
instruction over the usual sequence. You also lose the ability to fold
the actual address-computation into the LDR.

and now the obvious question: for aarch64, is there an adrspace(256)
identical declaration for LLVM?

Nope. That's what I meant by saying there's no direct control over
these features from LLVM.

I am completely lost, where and how to start the transformation. One
solution would be to modify clang code generation... but that seems to be
more complex solution and not so general solution.

It's a very difficult problem. The main issue is that the stack won't
be in this special address space (at least not without heavy LLVM
modifications), so you need a way to distinguish stack accesses from
heap. Without source annotation that's reducible to the halting
problem. For example:

int load_address(int *addr) {
  return addr;
}

int evil(int *heap_addr) {
  int local_var = 42;
  return load_address(rand() % 2 ? heap_addr : &local_var);
}

Should the code emitted for load_address use gs or not?

Cheers.

Tim.

int load_address(int *addr) {
  return addr;
}

Sorry, that should be "return *addr;".

Tim.

Thanks again for your help!

>>
>> Probably fairly minimal in most cases (on x86). On ARM there is
>> definitely a cost.
>>
> hm... why? You cannot have indexed addressing?
What I need is a way to force
The code that needs to be emitted is roughly:
    [..."segment"-offset into x1...]
    mrs x0, tpidr_el0
    ldr xD, [x0, x1]

That's a more complex addressing mode and an additional MRS
instruction over the usual sequence. You also lose the ability to fold
the actual address-computation into the LDR.

but this is the price you pay always for RISC vs. x86, or? Probably it is
difficult to quantify but wonder if it would add more than 5% slowdown to
an average program, especially long running server class application.

> and now the obvious question: for aarch64, is there an adrspace(256)
> identical declaration for LLVM?

Nope. That's what I meant by saying there's no direct control over
these features from LLVM.

wouldn't it make sense to add such an addressing instruction at LLVM IR
level? I mean there were no similar requests? Do not know if there is any
interest, but this would help implementing lot of stuff like pointer size
compression on 64 bit (pointers would be kept as 32bit), easier data
sharing between processes (mmap with segmented addressing), position
independent data (load and save chunks of data with pointers, keeping
pointers semantics valid).

Knowing this, it means that my compiler has to generate platform dependent
assembler code inside the IR. Which means I would not be able to run such a
code inside LLVM virtual machine.

Another solution for my problem would be to carry around the segment
address as extra function parameter to all functions, but that would be a
funny

It's a very difficult problem. The main issue is that the stack won't
be in this special address space (at least not without heavy LLVM
modifications), so you need a way to distinguish stack accesses from
heap. Without source annotation that's reducible to the halting
problem. For example:

int load_address(int *addr) {
  return addr;
}

int evil(int *heap_addr) {
  int local_var = 42;
  return load_address(rand() % 2 ? heap_addr : &local_var);
}

Should the code emitted for load_address use gs or not?

the stack should not be in this address space and this addressing should
not apply to stack. The framework would make any kind of C++ constructor
private (friend accessible only to some Factory methods), so such objects
could not be created on the stack only on heap. So I wonder if it is
possible in a LLVM pass to track back all pointers in the IR that were
initialized with a certain function (factory function) and change the
addressing

Tried to play with a naiv approach.

uint8_t *global_segment;
#define ainline __attribute__((always_inline))
template<class A>
   class CompactPointer
   {
      uint32_t adr;
      public:
      ainline A *operator->() { return
reinterpret_cast<A*>(static_cast<uint32_t*>(global_segment)+adr);}
   };

int main() {
   CompactPointer<OtherObject> cpoo;
   CompactPointer<Object> cp = cpoo->cpo;
}

That's a more complex addressing mode and an additional MRS
instruction over the usual sequence. You also lose the ability to fold
the actual address-computation into the LDR.

but this is the price you pay always for RISC vs. x86, or?

The price is paid in different ways, everywhere. All I can say for
sure is that addressing based on TPIDR is going to be more expensive
than without. Only benchmarks could quantify it.

wouldn't it make sense to add such an addressing instruction at LLVM IR
level? I mean there were no similar requests?

It's not come up before, no. It's not the worst idea I've heard, but
equally I'm not exactly convinced of the benefit yet. Either way, it
won't happen unless someone implements it ("patches welcome" as the
saying goes).

Another solution for my problem would be to carry around the segment address
as extra function parameter to all functions, but that would be a funny

That's not exactly a terrible idea (I believe GHC might do something
morally similar). It allows the compiler to spill it if necessary
unlike reserving a register absolutely (say before a
performance-critical loop), but its omnipresence probably discourages
the spilling.

If nothing else, it sounds like a useful way to get yourself up and
running without backend or OS support.

So I wonder if it is possible in a
LLVM pass to track back all pointers in the IR that were initialized with a
certain function (factory function) and change the addressing

This is the problem I believe is logically impossible without source
help, and if you've got that you'd just as well emit different IR to
begin with.

On x86-64, unless I call some library functions I have the guaranty that
nobody would change the values in the gs/fs registers.

You do? I thought both were reserved by Linux. I suppose if you hack
the kernel and/or libc you could fix them.

Is there a way to tell LLVM not to reserve a certain register?

I don't think I follow here. Reserving a register is possible in
certain limited circumstances (though discouraged, at least by me).
Unreserving a register isn't, as far as I'm aware.

Cheers.

Tim.

The price is paid in different ways, everywhere. All I can say for
sure is that addressing based on TPIDR is going to be more expensive
than without. Only benchmarks could quantify it.

well, will need some experimenting, first need an 64bit arm board (not sure
if qemu arm64 is fit enough)

> wouldn't it make sense to add such an addressing instruction at LLVM IR
> level? I mean there were no similar requests?

It's not come up before, no. It's not the worst idea I've heard, but
equally I'm not exactly convinced of the benefit yet. Either way, it
won't happen unless someone implements it ("patches welcome" as the
saying goes).

Tried to describe at least the projected benefit in my prev emails, though
no idea how useful it will be for the hole industry, precisely how useful
others will find it. Personally I assume that the cost of addressing
overhead is worth to pay to save a lot of memory especially in pointer
intensive applications (is there any class of application that cannot be
called like that?). I know that Oracle invested some money in pointer
compression, javascript has also some tricks to embedd information in the
redundant part of 64 bit pointers. I am sure behind the scenes there are
lot of such hacks to reduce the space used by 64 bit pointers. Already
described in my prev email, but maybe some other wording: such segmented
addressing would use only 32 bit pointers (in extreme case 16 bit!!) and
the virtual memory area would be split into segments (well, yes, back to
segmented addressing of x86). Once having this mechanism, segments could be
relocated, saved/loaded to disk, shared between processes, migrated to
other processes etc. One could write some fast loadable in memory database
engines on top of it that would both fit for 64 bit servers and phones.

> Another solution for my problem would be to carry around the segment
address
> as extra function parameter to all functions, but that would be a funny

That's not exactly a terrible idea (I believe GHC might do something
morally similar). It allows the compiler to spill it if necessary
unlike reserving a register absolutely (say before a
performance-critical loop), but its omnipresence probably discourages
the spilling.

If nothing else, it sounds like a useful way to get yourself up and
running without backend or OS support.

Just that would be nicer to write C++ code (runtime for my language)
without exposing this detail... Ideally C++ could be extended, but take
this at the moment rather like an utopia, and introduce keyword shortp in
front of pointer declarations.

class Class {};

class TheClass {
     shortp char *int;
     shortp Class *cls;
};

and any statement that would access such pointers, should be compiled to
"segmented" addressing.
The compiler should forbid the allocation of such objects on the stack.
Though this mechanism could be extended to stack if another register could
be allocated for stack segment based addressing.

> So I wonder if it is possible in a
> LLVM pass to track back all pointers in the IR that were initialized
with a
> certain function (factory function) and change the addressing

This is the problem I believe is logically impossible without source
help, and if you've got that you'd just as well emit different IR to
begin with.

will study more in depth the code emitter in clang...

> On x86-64, unless I call some library functions I have the guaranty that
> nobody would change the values in the gs/fs registers.

You do? I thought both were reserved by Linux. I suppose if you hack
the kernel and/or libc you could fix them.

well, I am confused, thought GS would be the thread boy, but it seems that
it is FS, and GS is not affected. Tried the following:

#define GS_RELATIVE __attribute__((address_space(256)))
int GS_RELATIVE *gsr;
int main(){
   int i = 12345;
   arch_prctl(ARCH_SET_GS, &i);
   gsr = 0;
   printf("our gs relative ... %d\n", *gsr);
}

then I created threads an so on and the value 1234 at gs:0 is printed
correctly, so the kernel does not seem to change the value of fs.
But of course I should clarify this.

Is there a way to tell LLVM not to reserve a certain register?
I don't think I follow here. Reserving a register is possible in
certain limited circumstances (though discouraged, at least by me).
Unreserving a register isn't, as far as I'm aware.

Well, if FS and GS are compromised by kernel, thought to force the IR to
reserve one of the general registers to hold my segment value instead of
passing it as parameter to each function. I understand that some LLVM
passes may observe that the value is used often, but I still have some
overhead to pass a 64 bit pointer to each function call.
But joggling with thread local storage (__thread) comes to my mind that
this could be my best solution, to store my segment address as "__thread"
data. Well, would not use directly the content of any registers (gs/fs),
but the generated machine code would load it in register any time I need it
and would optimize it for several successive references. Though the
question arises if there is a way to tell to the optimizer, hey, don't
care, nobody will change the value of this so you can consider it "valid"
after function calls as well, so no need for an eventual reload of the
value from the thread local storage.

thanks a lot for your help
mph