Fat-pointer Transformation pass

Hello Everyone
I am currently working on a fat-pointer scheme on Risc-V 64-bit machines, where we convert every pointer to a 128 bit pointer: |base|bound|cookie|pointer| each field of 32bits size. We support a process space of only 32-bits. The checks at runtime are done using custom Risc-V instructions, one for checking spatial safety and the other for checking temporal safety.
Currently, I am implementing an LLVM IR module-level pass to change pointer to fat-pointers. Within the same pass I first create new function with modified function types and create new call instructions for these newly created functions(but with same old argument list) and later in the pass for each instruction, I change the types of pointer operands. Currently I am building LLVM in debug mode. So when I am creating the call instruction, it fails the assertion check where function type should match with the argument type in the call-instruction. In LLVM Release mode this error doesn’t pop up.

Is there any workaround for this? Is it recommended to create the call instruction with proper argument types in the first place itself?

Thanking you

That’s far too late in the pipeline; how do you expect sizeof(void *) to work correctly in C, for example?

Hello @jrtc27
Yes, sizeof() is not working in my case since it occurs way before. I did not think about this case. How can I handle this case? Or in general, is there a preferred approach of transforming pointer to fat pointers?

Thank you so much

You don’t transform them, you keep the true representation the whole way through including in the frontend.

Is there any open-sourced, in-place fat pointer tool that I can refer to handle these cases and also help me in hacking in clang/llvm?

Thanking you

We do fat pointer-like things in CHERI LLVM (GitHub - CTSRD-CHERI/llvm-project: Fork of LLVM adding CHERI support) but we have quite a significant diff so it’s not really accessible as a teaching aid.

Hello @jrtc27
Thanks for suggesting the CHERI LLVM project. I have always been in awe of CHERI project :blush: Is there, by chance, a LLVM-internal doc w.r.t CHERI project?

Currently, what I am trying to achieve is: To convert every pointer to fat-pointer(with in-place base-bound metadata) and the assumptions: not to change the source code, not support variadic functions. The compiler should automatically convert them into fat-pointers. Is this possible in the Clang+LLVM model? Or should I use a source-source code translator?

There’s no converting in CHERI LLVM, they just are capabilities if you compile for the pure-capability ABI, right up to the AST.

Converting to fat-pointers without changing source code: Is this possible in the vanilla Clang+LLVM model? Or should I use a source-source code translator?

You change Clang to create a different AST and emit different IR. There is no converting; void * just is capability, I don’t know how else to say it. Just as how void * just is a 64-bit integer on amd64.

I’m pretty sure you can’t do what you want for a low-level language like C/C++ (i.e. that exposes the object’s layout/size). The support MUST be done at the frontend level for those. You won’t need to change the source code for that in many cases, but ultimately some code would just assume that sizeof(void*) == sizeof(size_t) or something similar.

Oh okay. So changing the AST and IR changes w.r.t clang helps deal cases like sizeof() primitive, early in the pipeline. And now in the LLVM instrumentation pass, I should add instructions that check these fat pointers at runtime.
Am I thinking in the correct direction?

Yes I agree. Complete compatibility is difficult to achieve. I should check for this type of instances in the source code. Thanks for pointing this out.

Hello @jrtc27

In my current scheme, I am not doing any hardware changes. So in that case if I change the AST and emit new IR, say for every data pointer(64-bit) I am changing to a 128-bit fat-pointer by introducing a new type, should it be like: alloca instruction should return a 128-bit fat-pointer to a position on stack instead of a vanilla 64-bit pointer. Am I correct?

I think what you want is not to change/translate anything. At that point the semantics of the original program are likely already lost/hard to recover. What you need is to emit the “correct” (128-bit aware) AST/IR from the start. I.e. what you need to do is really similar to adding a new arch support (x64 is not supported by generating x86 AST/IR and then changing all pointers to be 64-bit, similar situation here).