Converting pointers to fat pointers

Hi,
I am looking at using LLVM/Clang to automatically convert pointer declarations to fat pointers & the corresponding dereferences to something appropriate. I am looking for guidance on doing this. Will an LLVM pass be better suited to this or would this be better handled using Clang. Any guidance on getting started would be helpful.

Sai Charan,
CSE, UC Riverside.

It would be best handled by modifying Clang, both in semantic analysis (to change the size of a pointer) and IR generation (to generate, propagate, and consume your fat pointer values). I’m afraid that clang’s IR generation widely assumes that pointers are represented as a single llvm::Value, though, and you might be in for a lot of work.

John.

Converting to fat pointers can also be done at the LLVM IR level and, in fact, there’s a modern implementation of fat pointers at the LLVM IR level in the SAFECode project (). The implementation is SoftBound from University of Pennsylvania, and it implements what is essentially a fat pointer approach that does not modify data structure layout. You can read about SoftBound at . One of the problems with implementing fat pointers within clang is that clang does not have the entire program, and so you cannot use whole program analysis to determine if parts of the program are aware of the data structure layout. An LLVM IR analysis that is part of the link-time optimization framework can, and so a transform at the LLVM IR level could determine when it is safe to modify a data structure layout and when it is not. All that said, if you’re using a fat pointer method that doesn’t modify data structure layout (SoftBound has this feature; Xu et. al.'s work at doesn’t either, IIRC), implementing it in Clang would also work. As an FYI, I’m advocating for a common infrastructure in LLVM for adding and optimizing memory safety run-time checks; the idea is to have common infrastructure that will work both for fat pointer approaches, object metadata approaches, and other approaches. You can find my proposal at . I’d welcome any feedback or comments you may have on it. – John T.

In the interest of time & effort, I am leaning on working at the LLVM IR level.

The code listing in section 3.1 of the SoftBound paper is precisely what I am looking to do. However, the listing is at the C source level, while section 6 says that the implementation has been done on the LLVM IR; I don’t see how I can figure out pointer de-references in LLVM IR. Every alloca/load/store is via *.

In summary, how do I figure out pointer de-references in LLVM IR.

Sai Charan,
CSE, UC Riverside.

In the interest of time & effort, I am leaning on working at the LLVM IR level.

The code listing in section 3.1 of the SoftBound paper is precisely what I am looking to do. However, the listing is at the C source level, while section 6 says that the implementation has been done on the LLVM IR; I don’t see how I can figure out pointer de-references in LLVM IR. Every alloca/load/store is via *.

In summary, how do I figure out pointer de-references in LLVM IR.

Ignoring intrinsic functions, the only LLVM IR instructions that dereference pointers are load and store.

The intrinsics that access memory via pointers should be pretty easy to spot when you read through the LLVM Language Reference Manual: things like the atomic intrinsics, the string manipulating intrinsics, etc.

You can see what SAFECode does by looking at the LoadStoreChecks.cpp source code. You can probably find the equivalent code in the SoftBound code, but I do not know myself where it is.

– John T.

LoadStoreChecks.cpp was a helpful pointer. For the record, in SafeBound v1.2, lib/SoftBoundCETS/SafBoundCETS.cpp seems to the relevant portions.

Thank you.

Sai Charan,
CSE, UC Riverside.

While true, this only tells half of the story. You also need to be careful of pointer arithmetic, which can be done either via GEPs or via int-to-pointer casts. If your fat pointers contain, for example, bounds information then you will need to track all of these.

We (cl.cam.ac.uk) are currently in the process of adding LLVM support for a custom MIPS-based processor that has hardware support for capabilities, meaning that we have 64-bit and 256-bit pointers in the system. 64-bit pointers are just numbers, but 256-bit (capability) pointers include start, bounds, and a set of permissions. Both can be dereferenced, although via different instructions (64-bit pointers are implicitly checked against a specific capability register, depending on how they are used).

We're hitting a few issues in that LLVM IR assumes that you only have one pointer size and so, for example, does things like fold a 64-bit pointer to int to 256-bit pointer sequence into a bitcast, and assumptions in the back end that pointers are integers.

David