All,
I am building my own language with llvm as the base.
I was working on string concatenation (where a string is just an array of characters cast to a pointer to a character (i8*) ). Given two strings, it is possible to determine the length of new string by summing the number of characters until the null terminator and adding one.
Unfortunately, I have no idea how to use the c-api to store this. As the length of the new string is not a compile-time constant (e.g. stored in a Value*), I cannot determine at compile-time what length the llvm array-type will be? Therefore, I cannot create the GlobalVariable since I do not know the type.
One possible solution I thought of was linking to the malloc function and calling that, but I’m sure there’s a better way. If any of you have implemented a similar sort of string concatenation, I would much appreciate any advice that you could give.
Thanks,
Billy
The toy language I’ve been playing around with represents all strings as a struct in llvm;
struct string{
char *ptr;
int str_len;
int buffer_len;
}
And my AST has an interface like;
String_AST{
int measure();
void copy(char *dest);
struct string get_value();
}
A constant string can be measured at compile time, for a string variable measure() just extracts str_len. Strings passed in from other external sources are measured immediately, but llvm optimisations will eliminate the call if the return value isn’t used.
The implementation of get_value() for a concatenation AST node can generate code to evaluate each sub string, measure them, allocate the final buffer length, and only then copy each sub string directly into the final buffer.
I also support a string append operation that will reallocate the buffer only if the existing one is too small.
Ultimately you will need to work out if you want pascal / java style strings like mine, or C style NULL terminated strings. And how the memory for these strings will be managed.
However, how would one allocate the buffer for a string if you did not know the length of the string at compile time?
For instance, using the api how would one reproduce the code for the following c++ function?
std::string add(std::string a, std::string b){
return a+b;
}
When allocating the buffer required for the new string, one can determine the length at runtime, however I do not know how one can allocate a global array with its size determined by a Value*.
However, how would one allocate the buffer for a string if you did not
know the length of the string at compile time?
FYI, LLVM doesn't provide a "platform" or "VM" or "runtime". You will need
to be familiar with your target platform's API's; on most platforms you can
probably get away with just calling malloc for the case at hand though.
-- Sean Silva