Eliminating naked pointers from the OCaml bindings to be compatible with OCaml 5

Greetings LLVM developers,

OCaml 5 will have a much-anticipated new, multicore, runtime. This runtime will not support “naked pointers,” or pointers outside the OCaml heap used directly OCaml values. The recommended solution is to wrap such pointers in an OCaml block (heap allocation) tagged with Abstract_tag or Custom_tag.

The LLVM OCaml bindings use naked pointers to expose LLVM objects, and therefore the bindings will not work with OCaml 5. The LLVM package on OPAM has already been tagged as being incompatible with any release of OCaml that disallows naked pointers. I’ve already filed an issue about this on the LLVM GitHub. Over the past week, I’ve been staying up late fixing the OCaml bindings to not use naked pointers, and now I finally have a fixed version of llvm_ocaml.c, the “main” and longest C file of the binding code.

I plan on fixing the rest of the files now. I won’t be able to run tests until I convert all the files, and with the magnitude of the change, I predict that somewhere will have some kind of error on the first test run, but the compiler has caught a lot of my mistakes at compile-time. I just hope that I didn’t make a conceptual error anywhere. I would love to have an informal review of my work so far.

Thanks for working on this! I think that you should definitely sync up with @kit-ty-kate since IIUC removing naked pointers is also on her roadmap. I would be happy to look at what you have done, but can’t guarantee to be fast.

Hi, thanks a lot for doing this work!

On first quick look it looks like it’s going in the right direction so far. The rule of thumb is to make all functions that is called by OCaml something like that:

value <function_name>(value x, value y and so on) {
  CAML<N>param(x, y and so on);
  CAMLlocal(tmp, res, …);
  <SomeLLVMType> x = <TheLLVMFunctionYouAreBinding>(... with all the right <Type>_val() macros);
  res = caml_alloc(1, Abstract_tag); /* caml_alloc_custom can also be used but it's more complicated */
  *((<SomeLLVMType> **) Data_abstract_val(v)) = x;

I’ve tried to cherry-pick your work on top of 14.0.6 to test it locally but along the line i noticed there was quite a few numbers of changes in main that I don’t understand.

Are all the typed pointers (e.g. i32*) removed in LLVM 16?

To answer your questions from Removing naked pointers from the LLVM bindings · Issue #22221 · ocaml/opam-repository · GitHub

When unboxing wrapped LLVM pointers into a temporary array to pass to LLVM, should I allocate the temporary array using malloc, bump the array on the OCaml heap (what I currently do), or have a per-domain “dynamic array” that is reused whenever LLVM pointers need to be copied into a temporary array?

I’m not sure what you mean by bump the array on the OCaml heap but in general everything that’s to do in C stay in C so malloc and free should be used. For example if you pass OCaml pointers in C code, the pointer could be stored for whatever reason but some time later the OCaml GC could have moved the array elsewhere so now the value might be random or even out of the allocated memory if the GC triggered compaction.

Yes, typed pointers are removed in LLVM 16. Sorry I was undermining my own rewrite here because I was also making those changes at the same time…

P.S. the changes aren’t nearly ready to be tested because there are still mane files that have yet to be converted…

Sometimes, I need to allocate a temporary array, whose lifetime does not exceed the function call. Is it more efficient to call malloc, then call free at the end, or allocate from the OCaml heap (and leave the memory to be collected)?

Honestly i have no idea. To me it seems like micro-optimization so i would rather encourage you to just use malloc and free

It is likely simpler to use malloc/free since if you allocate the array on the OCaml heap then you will need to be very careful to initialize it with valid values, not allow this array itself to have naked pointers, ensure that LLVM will not have a pointer to it when any OCaml allocation or signal handling might happen, etc.


Throughout the code, I get warnings like

/llvm-project/build/bindings/ocaml/transforms/passmgr_builder/passmgr_builder_ocaml.c:39:13: warning: "alloc_custom" is deprecated: use "caml_alloc_custom" instead
   39 |       alloc_custom(&pmbuilder_ops, sizeof(LLVMPassManagerBuilderRef), 0, 1);


I have changed some instances of this to use the caml_ prefix, and have left some other instances alone. Do we need to worry about old versions of OCaml where the caml_ prefixed versions were not defined yet? Should I go back and stop using the caml_ prefixed functions? Or can I go ahead and change all of them to be prefixed with caml_?

I’m making progress! I finished converting all the .c files and removing compile errors, so I built check-llvm-bindings-ocaml and got… six passing tests and ten failing tests! :grin:

I did a backtrace with ocamldebug and found out that a segmentation fault in the core.ml test was coming from const_null. To my horror, this function binded directly to the LLVM C API! There are a whole bunch of functions that do this, and therefore I missed them when going through the C source files. Welp, there is more work to do then, but the good news is that all the code apparently ran fine until that function was called, so my approach seems to be sound, as opposed to me finding a mysterious segfault that I could not explain, or discovering that my entire work converting the code was based on an inherently flawed design.

1 Like

The caml_-prefixed names have been available since OCaml 3.08 released in 2004. They should be used. Furthermore, the unprefixed names are no longer available in OCaml 5.

1 Like

Good news! I’ve converted all of the code, and it passes all tests on OCaml 4.14! The working code is now at GitHub - alan-j-hu/llvm-project at nnp. Now, I need to test the code on OCaml 5.0~beta1.

Good news! The tests pass on both OCaml 4.14 and OCaml 5~beta1! I have a patch up on ⚙ D136400 [ocaml-llvm] Migrate from naked pointers in preparation for OCaml 5.

1 Like

The original binding code on the main branch often skipped using CAMLparamX and CAMLreturn because many of the functions did not manipulate OCaml heap objects. When converting the bindings to eliminate naked pointers, I followed all the rules at OCaml - Interfacing C with OCaml because I assumed that to_val might allocate on the OCaml heap. When I checked the class definitions of all the LLVM types that get wrapped in to_val, I think it is safe to assume that all pointers are at least 2-bit aligned, and therefore don’t need an allocation. Now, I have a stacked diff at ⚙ D136537 [llvm-ocaml] Assume pointers are at least 2-bit aligned that assumes this, and also makes the following optimizations:

  • Skips CAMLparamX and CAMLreturn entirely if the function does not allocate on the OCaml heap
  • Only registers OCaml boxed values with CAMLparamX (i.e. it does not pass an int or a bool to CAMLparamX)

This stacked diff is therefore closer to the original code on the main branch. However, it relies on more assumptions that could possibly change, or I may have gotten wrong, so it is more brittle. Would you think these changes are good, or are they micro-optimizations that take a lot of risk for little benefit?

(I’m posting here for the people who aren’t in the Phabricator discussion to see, but if you have a Phabricator account, I think it is better to reply over there.)

I’m concerned because my patch has gone a long time and nobody has reviewed my changes yet. @jberdine promised to look over it around week later, but now, a month has passed and nobody has looked at it. OCaml 5 will probably be released before the end of the year. What should I do?

Standard practice is to add a “ping” reply, maybe once a week or so, which may help remind people that the patch is waiting. If someone did promise to review it, you can tag them in the reply as well.

1 Like

I gave two pings to @jberdine (who emailed me in late October to say he would check out my patch in the next week or so). I’m worried because he hasn’t been so active recently. His last activity on Phabricator was on October 21, on my patch: ♟ jberdine He hasn’t been active on GitHub since November 7: jberdine (Josh Berdine) · GitHub On the OCaml forums, his last activity was more recent, on November 28: Profile - jjb - OCaml Is it possible that something unexpected happened to him?

Whitequark was the former code owner of the OCaml bindings, but removed herself: Remove myself from CODE_OWNERS. · llvm/llvm-project@60a2db5 · GitHub I don’t expect that she will be active and taking a look at my patch.

Who else could I add as a reviewer for my patch who would be knowledgeable about the OCaml bindings?

Also, I will have a medical operation in late January which will take a few months to recover from. I can’t guarantee that I will be capable of working on the patch during that time.

If it is difficult to find interested folks about the OCaml bindings, it may be a sign that this component has a very very small community. Perhaps we should consider removing this from the monorepo. LLVM has bindings for many languages and many exist out of the tree. [RFC] Moving OCaml bindings to peripheral tier and disabling them by default

You can also ask on https://discuss.ocaml.org/ who is interested in maintaining the binding.

I’ve seen and posted on that thread, and over there, I pinged some other potential stakeholders in the OCaml bindings. In my patch, I have touched all parts of the OCaml bindings, so I am willing to be the new code owner sometime in the future if other people deem me qualified for the responsibility. Unfortunately, right now, I am recovering from a major surgery, so I am not in a physical state where taking on this responsibility would be prudent. Maybe in three months, I can reevaluate how I feel.

In the meantime, @jberdine has become active again and given my patch a code review.