short Object* object = new Object(); //pointer compression

Hi@ one of the most interesting mailing lists in IT :slight_smile:

as stated in the subject, would like to implement a new keyword short in the context of pointer declaration.

short Object* object = new Object();

The objective is to implement some sort of 64 bit pointer compression that will have some additional benefits. Related to the topic I also started a thread on llvm mailing list.

So what should happen behind? A custom new operator (name could change accordingly) would allocate memory from the active segment, but would return only a 32 bit offset inside a 4GB adressable segment. The segment address would be kept either as a thread local storage, register, or as method argument)

class MyObject {
public:
short Object *mainObject;

vector<short Object*> children(10000);
};

not necessarily extreme example but suddenly my MyObject object would save a significative amount of memory.

All references to the members should be as with normal pointers:

short MyObject myObject = new MyObject();
short Object
object = myObject->mainObject;

Of course there should be some glue code that will glue the factory method for new and will manage the segment address.

Besides pointer compression, what other advantages? Data would be position independent. This means that:

→ data from segments (segments can be smaller than 4GB) can be saved and loaded at any address. Interesting candidate for some fast in/out memory database.
→ data from segments can be equally easily shared and accessed between processes through mmap (pointer semantics would not change)

After this short story I wonder what would be the easiest way to implement this as an extension to clang infra for C++ parsing. I saw that there were some threads from 2011 about plugins, but nothing afterwards can be found.

any hints would be more than wellcome

I think your only option is to modify clang itself (someone will correct me if I’m wrong). It’s probably best to choose a different spelling for your keyword as ‘short’ is already reserved and can appear in so many contexts. Seeing short short* would be so confusing :slight_smile:

And short int* would be ambiguous. You would at least need to put it on a level with const, making it int *short.

Sebastian

I think your only option is to modify clang itself (someone will correct me
if I'm wrong). It's probably best to choose a different spelling for your
keyword as 'short' is already reserved and can appear in so many contexts.
Seeing short short* would be so confusing :slight_smile:

This sounds possibly like __ptr32, which we already have some minor
amount of support for, with the extra benefit of at least one other
vendor implementing such an extension.

http://msdn.microsoft.com/en-us/library/aa985900.aspx

~Aaron

probably shortptr or shortp would be best, and would simplify also the parsing

Based on what the MSDN page says:

" __ptr32 represents a native pointer on a 32-bit system, while
__ptr64 represents a native pointer on a 64-bit system."
"On a 32-bit system, a pointer declared with __ptr64 is truncated to a
32-bit pointer. On a 64-bit system, a pointer declared with __ptr32 is
coerced to a 64-bit pointer."

I don't think that's what the original poster needed (32-bit offset in
a 64bit segment).

Csaba

I don't think that's what the original poster needed (32-bit offset in
a 64bit segment).

let me make a correction. Indeed it is about a 32 bit (or less) offset
inside a 32 bit (or less) addressable segment. Address would be always
calculated as 64 bit pointer + 32 bit offset.

Indeed, as I stated in my prev, email __ptr32 would not make too much
sense...

I further worked on this concept and almost came to the conclusion that
implementation of special class of pointers would become too complicated
(both clang and llvm). Would be though interesting to "teach" the compiler
to allow arithmetics like

shortptr = longptr; //(where shortptro = longptr-base)
//or
longptr = shortptr; //where longptr = shortpr + base), where base would be
a task (thread) specific value

The framework or the "extension" should make sure that such values are
allocated in the correct adress space.

I will probably go for an "eco pointer template" solution

template<class T>
class ecoPtr {
   private:
      static void * operator new(std::size_t); //forbid allocation on heap
      static void * operator new[](std::size_t);
   private:
      uint32_t ptr; //this is the real pointer compressed to 32 bit
   public:
      template<typename... Args>
      inline eco(Args... args) __attribute__((always_inline)) { /*
custom allocate T and return the offset */ }
      T* operator->() {}
};

This would also implement later a ref counting, but ref count would be held
inside T