Need help with code generation

I wrote my compiler and now it generates LLVM IR modules. Now i’d like to go ahead and make object file and then executable, just like clang does.

What should I have to use to create the object files? and then how do I call the ld? (not llvm-ld, I want my compiler to work like Clang and I read that Clang doesn’t use llvm-ld).

If you’ve created a .bc or a .ll file then the simplest thing is to just give it to clang exactly the same as you would for a .c file. Clang will just Do The Right Thing with it.

If you don’t want to link, then pass flags such as -c to clang as usual.

e.g.

---- hello.ll ----

declare i32 @puts(i8*)
@str = constant [12 x i8] c"Hello World\00"

define i32 @main() {
%1 = call i32 @puts(i8* getelementptr inbounds ([12 x i8]* @str, i64 0, i64 0))
ret i32 0
}

I’d like to make my compiler independent, just like Clang. Doesn’t Clang call llc and then system’s ld by itself? I don’t want my compiler to depend by any other program.
I guess there will be a class in the llvm library that generates the object files based on the system’s triple and data layout, and then call the system’s ld?

Hi Lorenzo,

Clang doesn’t call llc; LLVM is compiled into Clang. Clang does call the system linker though.

Making your compiler generate object code is very simple. Making it fixup that object code and execute it in memory (JIT style) is also simple. Linking it properly and creating a fixed up ELF file is less simple. For that, you need to compile to object (using addPassesToEmitFile() - see llc.cpp) then invoke a linker. Getting that command line right can be quite difficult.

Rafael, This would be a good usecase for LLD as a library. I heard that this is is an explicit non-goal, which really surprised me. Is that indeed the case?

Cheers,

James

If you plan on calling C runtime library functions, you probably want to do what I did:

Cheat, and make a libruntime.a (with C functions to do stuff your compiler can’t do natively) and then link that using clang or gcc.

https://github.com/Leporacanthicus/lacsap/blob/master/binary.cpp#L124

At some point, I plan to replace my runtime library with native Pascal code, at which point I will be able to generate the ELF binary straight from my compiler without the runtime library linking in the C runtime library, but that’s not happening anytime real soon. Getting the compiler to compile v5 of Wirth’s original Pascal compiler is higher on the list… :slight_smile:

@james
Yeah for code generation I figured out that clang doesn’t actually use llc, and I already started reading its code to see how it works.

Yes, you shouldn’t have any trouble just declaring and using C functions such as fopen, fclose, puts, fputs, fputc.

You’re likely to find that putc is a macro not a function, in which case you won’t be able to use that.

Depending on whether it’s important, if you’re running on a Unix-like system then you could save quite a bit of size in your binary by using open(2), close(2), read(2), write(2) directly, as they’re not any harder to use. But the C standard library is available in more places.

You’re right well, it’s just like fputc(stdout, x).
Last thing. Are 4 calls to fputc as fast as a call to fputs with a 4-char string? Or fputs may be faster?

Hi Lorenzo,

Clang doesn't call llc; LLVM is compiled into Clang. Clang does call the
system linker though.

Making your compiler generate *object* code is very simple. Making it
fixup that object code and execute it in memory (JIT style) is also simple.
Linking it properly and creating a fixed up ELF file is less simple. For
that, you need to compile to object (using addPassesToEmitFile() - see
llc.cpp) then invoke a linker. Getting that command line right can be quite
difficult.

Rafael, This would be a good usecase for LLD as a library. I heard that
this is is an explicit non-goal, which really surprised me. Is that indeed
the case?

You can use LLD as a library.

A corrupted file could cause a fatal error or SEGV.

Uhhh, that’s not particularly useful.

> A corrupted file could cause a fatal error or SEGV.

Uhhh, that's not particularly useful.

"Corrupted" means really corrupted, like ELF header is broken. Is this
really the case?

Well sure, it’s unlikely, but how many consumers can make that sort of guarantee? And if a consumer can’t guarantee the integrity of the ELF file they have no choice but not to use LLD, or to fork before using it.

Correct.

Cheers,
Rafael

We had a long discussion recently and the decision was made so that we can go ahead. It is not a good idea to discuss that again. At least it is too soon.

I’d recommend to use lld’s link() function if input is guaranteed to be consistent (such as outputs of clang). Otherwise, please use fork.

Correct

Out of interest, how does LLD itself handle error reporting when invoked from the command line, and how does it avoid segfaulting in that case?

Cheers,

James

> Correct

Out of interest, how does LLD itself handle error reporting when invoked
from the command line, and how does it avoid segfaulting in that case?

It generally reports an error and exit, or in rare circumstances it just
segfaults.

If it can exit, why can’t it longjmp back to a library consumer at least?

We do not enable exceptions and longjmp is not safe. Also, if it can segfault for some pathetic input, “it longjmps in most cases” doesn’t help people who wants 100% guarantee like you.

Also, if it can segfault for some pathetic input

Surely that’s a bug though, not seriously designed behaviour?

> Also, if it can segfault for some pathetic input

Surely that's a bug though, not seriously designed behaviour?

No. That is a design choice.