Eliminating naked pointers from the OCaml bindings to be compatible with OCaml 5

Greetings LLVM developers,

OCaml 5 will have a much-anticipated new, multicore, runtime. This runtime will not support “naked pointers,” or pointers outside the OCaml heap used directly OCaml values. The recommended solution is to wrap such pointers in an OCaml block (heap allocation) tagged with Abstract_tag or Custom_tag.

The LLVM OCaml bindings use naked pointers to expose LLVM objects, and therefore the bindings will not work with OCaml 5. The LLVM package on OPAM has already been tagged as being incompatible with any release of OCaml that disallows naked pointers. I’ve already filed an issue about this on the LLVM GitHub. Over the past week, I’ve been staying up late fixing the OCaml bindings to not use naked pointers, and now I finally have a fixed version of llvm_ocaml.c, the “main” and longest C file of the binding code.

I plan on fixing the rest of the files now. I won’t be able to run tests until I convert all the files, and with the magnitude of the change, I predict that somewhere will have some kind of error on the first test run, but the compiler has caught a lot of my mistakes at compile-time. I just hope that I didn’t make a conceptual error anywhere. I would love to have an informal review of my work so far.

Thanks for working on this! I think that you should definitely sync up with @kit-ty-kate since IIUC removing naked pointers is also on her roadmap. I would be happy to look at what you have done, but can’t guarantee to be fast.

Hi, thanks a lot for doing this work!

On first quick look it looks like it’s going in the right direction so far. The rule of thumb is to make all functions that is called by OCaml something like that:

value <function_name>(value x, value y and so on) {
  CAML<N>param(x, y and so on);
  CAMLlocal(tmp, res, …);
  <SomeLLVMType> x = <TheLLVMFunctionYouAreBinding>(... with all the right <Type>_val() macros);
  res = caml_alloc(1, Abstract_tag); /* caml_alloc_custom can also be used but it's more complicated */
  *((<SomeLLVMType> **) Data_abstract_val(v)) = x;
  CAMLreturn(res);
}

I’ve tried to cherry-pick your work on top of 14.0.6 to test it locally but along the line i noticed there was quite a few numbers of changes in main that I don’t understand.

Are all the typed pointers (e.g. i32*) removed in LLVM 16?

To answer your questions from Removing naked pointers from the LLVM bindings · Issue #22221 · ocaml/opam-repository · GitHub

When unboxing wrapped LLVM pointers into a temporary array to pass to LLVM, should I allocate the temporary array using malloc, bump the array on the OCaml heap (what I currently do), or have a per-domain “dynamic array” that is reused whenever LLVM pointers need to be copied into a temporary array?

I’m not sure what you mean by bump the array on the OCaml heap but in general everything that’s to do in C stay in C so malloc and free should be used. For example if you pass OCaml pointers in C code, the pointer could be stored for whatever reason but some time later the OCaml GC could have moved the array elsewhere so now the value might be random or even out of the allocated memory if the GC triggered compaction.

Yes, typed pointers are removed in LLVM 16. Sorry I was undermining my own rewrite here because I was also making those changes at the same time…

P.S. the changes aren’t nearly ready to be tested because there are still mane files that have yet to be converted…

Sometimes, I need to allocate a temporary array, whose lifetime does not exceed the function call. Is it more efficient to call malloc, then call free at the end, or allocate from the OCaml heap (and leave the memory to be collected)?

Honestly i have no idea. To me it seems like micro-optimization so i would rather encourage you to just use malloc and free

It is likely simpler to use malloc/free since if you allocate the array on the OCaml heap then you will need to be very careful to initialize it with valid values, not allow this array itself to have naked pointers, ensure that LLVM will not have a pointer to it when any OCaml allocation or signal handling might happen, etc.

2 Likes

Throughout the code, I get warnings like

/llvm-project/build/bindings/ocaml/transforms/passmgr_builder/passmgr_builder_ocaml.c:39:13: warning: "alloc_custom" is deprecated: use "caml_alloc_custom" instead
   39 |       alloc_custom(&pmbuilder_ops, sizeof(LLVMPassManagerBuilderRef), 0, 1);

.

I have changed some instances of this to use the caml_ prefix, and have left some other instances alone. Do we need to worry about old versions of OCaml where the caml_ prefixed versions were not defined yet? Should I go back and stop using the caml_ prefixed functions? Or can I go ahead and change all of them to be prefixed with caml_?

I’m making progress! I finished converting all the .c files and removing compile errors, so I built check-llvm-bindings-ocaml and got… six passing tests and ten failing tests! :grin:

I did a backtrace with ocamldebug and found out that a segmentation fault in the core.ml test was coming from const_null. To my horror, this function binded directly to the LLVM C API! There are a whole bunch of functions that do this, and therefore I missed them when going through the C source files. Welp, there is more work to do then, but the good news is that all the code apparently ran fine until that function was called, so my approach seems to be sound, as opposed to me finding a mysterious segfault that I could not explain, or discovering that my entire work converting the code was based on an inherently flawed design.

1 Like

The caml_-prefixed names have been available since OCaml 3.08 released in 2004. They should be used. Furthermore, the unprefixed names are no longer available in OCaml 5.

1 Like

Good news! I’ve converted all of the code, and it passes all tests on OCaml 4.14! The working code is now at GitHub - alan-j-hu/llvm-project at nnp. Now, I need to test the code on OCaml 5.0~beta1.

Good news! The tests pass on both OCaml 4.14 and OCaml 5~beta1! I have a patch up on ⚙ D136400 [ocaml-llvm] Migrate from naked pointers in preparation for OCaml 5.

1 Like

The original binding code on the main branch often skipped using CAMLparamX and CAMLreturn because many of the functions did not manipulate OCaml heap objects. When converting the bindings to eliminate naked pointers, I followed all the rules at OCaml - Interfacing C with OCaml because I assumed that to_val might allocate on the OCaml heap. When I checked the class definitions of all the LLVM types that get wrapped in to_val, I think it is safe to assume that all pointers are at least 2-bit aligned, and therefore don’t need an allocation. Now, I have a stacked diff at ⚙ D136537 [llvm-ocaml] Assume pointers are at least 2-bit aligned that assumes this, and also makes the following optimizations:

  • Skips CAMLparamX and CAMLreturn entirely if the function does not allocate on the OCaml heap
  • Only registers OCaml boxed values with CAMLparamX (i.e. it does not pass an int or a bool to CAMLparamX)

This stacked diff is therefore closer to the original code on the main branch. However, it relies on more assumptions that could possibly change, or I may have gotten wrong, so it is more brittle. Would you think these changes are good, or are they micro-optimizations that take a lot of risk for little benefit?

(I’m posting here for the people who aren’t in the Phabricator discussion to see, but if you have a Phabricator account, I think it is better to reply over there.)