LLVM C API string passing


I see there are many functions that take strings as pointer+size, which is great. There are also some functions that have variants for both taking a null terminated string and a pointer+size(eg. LLVMSetValueName+LLVMSetValueName2).

But there are many functions that only take strings as C strings, which I don’t like because in many cases I need to copy the string somewhere and add a 0, which is not ideal.

The question is what’s the general opinion on having functions that take pointer+size, and if it would be a good idea that for each function that takes C strings, to add a “2” variant that takes pointer+size, like:

LLVMValueRef LLVMAddFunction(LLVMModuleRef M, const char *Name, LLVMTypeRef FunctionTy);

LLVMValueRef LLVMAddFunction2(LLVMModuleRef M, const char *Name, size_t NameLen, LLVMTypeRef FunctionTy);

I would be happy to try and do that if people want it.

I wouldn’t want to have both variants of all functions by design, but it makes sense to me to have a policy that “pointer + size” is preferred for the C API. This would mean that adding a version 2 with “pointer + size” for old functions is okay, but any new functions that are added should have only a “pointer + size” version.

So would it be ok for me to add the version 2 for functions that do not take pointer+size?

How about introducing an LLVMStringRef opaque struct that can be “constructed” from pointer + size or from a C string given that the underlying C++ APIs want a StringRef anyway? We’ve taken that approach in MLIR to represent StringRefs and that worked reasonably well so far.

1 Like

I think the LLVMStringRef approach would’ve been a good idea if it was done from the start.
Adding a third way to pass strings seems a bit confusing at best, and I don’t think going back and breaking the current functions by changing every string to a LLVMStringRef is a good idea.

Stronger than that: No existing function’s API can be changed. The LLVM C API has a rigid backwards compatibility guarantee.