n00b question: From module/bitcode to Mach-O dylib file directly?

I'm a total LLVM n00b, and have just started to work through some of the tutorials with the intention of gathering a clear picture of what LLVM does and doesn't do for a possible project.

I'm on the Mac, and would like to have my code dynamically create and load new functions into a process. I believe I can do this, but I'm not sure yet how 'direct' things will be in terms of both the tool chain, and how much I need to emit intermediate files.

I think I will have to ultimately dynamically load from a Mach-O dylib file (dopen), but in any case I'm interested in whether I can go straight from my in-memory linked module(s) to a Mach-O file that I can dynamically load, using the LLVM API as it exists at V2.5.

Before I even worry about coding this up directly, I've been playing with doing this the available tools to see what the various steps are likely to be. From comments on this list, it looks like going from bitfile to a native Mach-O file ought to be possible, but I've tried "llc -filetype=dynlib" (and indeed -filetype=obj) and get the message "target does not support generation of this file type!". BTW, I compiled the LLVM stuff myself, but assume it does the right thing w.r.t. building in appropriate knowledge of the native platform.

I can see how to generate an assembly file, then use the platform tools to compile and link this, but I get the impression this is unnecessary with the current state-of-play.

Would anyone be kind enough to provide clue? I couldn't find any samples or docs that directly speak to this - probably because it's a minor part of the overall LLVM system.

Cheers

Luke

Are you just loading functions into the current process? There's no need
to create temporary object files and dynamically link them in; LLVM is
perfectly capable of assembling a function in memory. Look up the
docs for llvm::ExecutionEngine::create and go from there.

John.

Thanks John.

I had passed over ExecutionEngine as it looked like it offered a JIT. Maybe there’s more to it than meets the (hasty) eye though.

I’m interested in getting a native image. Ultimately, I’d like to do things like emitting Objective-C IMPs and building Objective-C classes around them.
However, I’m going one step at a time (there’s probably much to learn and maybe gotchas to discover before I get to this).
To begin with, I figured I’d try to get a Mach-O file written out, dload’ed and then make a regular C call to a loaded function.

So, I’ll take a look at ExecutionEngine, but I’m still curious if I’m currently able to emit Mach-O dylibs directly from the LLVM tool chain, or if I have to go a little more round-the-houses (perhaps emitting .s and then using platform tools to get to the dylib).

– lwe

I had passed over ExecutionEngine as it looked like it offered a JIT. Maybe there's more to it than meets the (hasty) eye though.

It does...

I'm interested in getting a native image. Ultimately, I'd like to do things like emitting Objective-C IMPs and building Objective-C classes around them.
However, I'm going one step at a time (there's probably much to learn and maybe gotchas to discover before I get to this).
To begin with, I figured I'd try to get a Mach-O file written out, dload'ed and then make a regular C call to a loaded function.

Aaah. Wasn't quite sure what you were doing here. I'm not positive what llvm can emit via the writers (there's support for all parts of the file format), but it could be extended to write one out. I'm just not quite sure why :slight_smile:

-eric

Aaah. Wasn’t quite sure what you were doing here. I’m not positive
what llvm can emit via the writers (there’s support for all parts of
the file format), but it could be extended to write one out. I’m just
not quite sure why :slight_smile:

-eric

Well, ultimately I’m curious about what it would take to port a JVM based language (http://openquark.org) to LLVM.
A main motivator though is language/library/platform integration on the Mac (Cocoa, Objective-C, autozone). I’m not too interested in a JIT at this point, but rather native code generation.

So far I can see a MachOWriter (with an “AddMachOWriter” in FileWriters.h) and it looks like I get to pass an appropriate TargetMachine to this.
I’m wondering if this is expected to be sufficient (added to a pass manager) to be able to output a library on the Mac - whether or not it actually works.

I only picked LLVM up yesterday, so I’m still trying to understand how the parts work together. With my current lack of orientation I have very little
intuition as to whether I’m on the right track, and if it fails, how close (or otherwise) I might have been to getting something going. So I suppose I’ve
been looking for hints at the vague ‘shape’ of the code I would expect to connect together to get the output I want, and while there are some nice samples for some parts of LLVM, I haven’t found much to help learn how to emit native .o or .dylib (assuming this is possible).

In the meantime I’ll spend some time trying to bang some of the aforementioned pieces together and see if I can get it to do anything interesting with a minimal module.

Thanks.

– lwe

You might consider building on HLVM instead:

  http://hlvm.forge.ocamlcore.org/

HLVM provides everything from tuples and boxed types to a garbage collector.
You just supply it with an AST, either incrementally for JIT compilation
(e.g. from a REPL) or it can spit out bitcode to build an executable.

Hey Luke,

Unfortunately, the 'state of the art' is that llc only really supports
emission of native assembly files (-filetype=asm) which can then be
assembled and linked with your native gas/ld.

There is some source support for 'object file generation', exposed via
the -filetype=obj flag to llc, but it is not complete, and totally
broken in some cases. This is something I am trying to work on with
Aaron. You can follow our discussion on the list here, and feel free
to pitch in.

That said, the MachO generation _should_ work (in 2.5) for outputting
.o files, which would still need to be linked using your native ld
into a dynlib. I don't see that llc will ever generate a dynlib, as I
think that that is not its function. The most you can expect is a
valid target object file.

If you encounter a particular issue using the -filetype=obj flag,
please let us know so we can fix it...

I for one would really like to see object generation become a fully
working feature of the llvm toolchain.

IIUC, MachOWriter is not yet 100% complete. It is a work in progress, which is not getting significant attention and love recently. It'd be very useful to have this completed.

It is simpler. There is not any Mach-O envelope. The platform linker
can directly read Mach-O files as well as llvm bit-code file (using
llvm bit-code file reader).

Not sure if/where the exact form of the Mach-O file that carries bit
code is documented.

Thanks for that. Yes, I see what you mean (no Mach-O envelope), yet there seems to be something fundamentally different between a bit-code file I create from Apple’s llvm-gcc-4.2, and one created from the LLVM APIs.
Particularly:

  • ld won’t consume the/a bit-code file that is directly generated by the LLVM APIs: “ld warning: in foo.bc, file is not of required architecture”
  • The bit-code file I have from LLVM differs from ones generated by Apple’s llvm-gcc-4.2. The latter has a “DE C0 17 0B” magic introducer, and there seems to be another 16 bytes before the “BC Code” (42 43 C0 DE) magic appears.
    Whereas, the file generated from the LLVM APIs starts immediately with the “BC Code” magic number.

There’s a really good chance that this is just a case of ‘user error’. At the very least it seems to me that I’m not doing something that’s required to emit bit-code files in the format that they are consumable by ld. Alternatively, there is some kind of wrapper that the llvm-gcc-4.2 tool produces around the basic bit-code, though not as you’ve pointed out, a Mach-O format.

FWIW, here are the first 24 bytes, up to and including what I recognise as the LLVM bit-code magic number:
DE C0 17 0B 00 00 00 00 14 00 00 00 AC 01 00 00 07 00 00 00 42 43 C0 DE…

The ‘file’ util identifies this as “Compiled PSI (v1) data”, though that’s probably not relevant/useful, as it just comes from the DE C0, which could be any number of things.

Anyway, I guess my simple question is:
What do I need to do to get home-brew bit-code output from the LLVM bit-code writer to conform to whatever requirements that ld has for input?

– lwe

Hi Luke,

It is simpler. There is not any Mach-O envelope. The platform linker
can directly read Mach-O files as well as llvm bit-code file (using
llvm bit-code file reader).

Not sure if/where the exact form of the Mach-O file that carries bit
code is documented.

Thanks for that. Yes, I see what you mean (no Mach-O envelope), yet there seems to be something fundamentally different between a bit-code file I create from Apple’s llvm-gcc-4.2, and one created from the LLVM APIs.
Particularly:

  • ld won’t consume the/a bit-code file that is directly generated by the LLVM APIs: “ld warning: in foo.bc, file is not of required architecture”
  • The bit-code file I have from LLVM differs from ones generated by Apple’s llvm-gcc-4.2. The latter has a “DE C0 17 0B” magic introducer, and there seems to be another 16 bytes before the “BC Code” (42 43 C0 DE) magic appears.
    Whereas, the file generated from the LLVM APIs starts immediately with the “BC Code” magic number.

Yes. If you set the target as darwin for the module while generating bit-code file from the LLVM APIs then llvm bit-code writer will add this.

See llvm/lib/Bitcode/Writer/BitcodeWriter.cpp for more info.

1292 /// EmitDarwinBCHeader - If generating a bc file on darwin, we have to emit a
1293 /// header and trailer to make it compatible with the system archiver. To do

1294 /// this we emit the following header, and then emit a trailer that pads the
1295 /// file out to be a multiple of 16 bytes.
1296 ///
1297 /// struct bc_header {
1298 /// uint32_t Magic; // 0x0B17C0DE
1299 /// uint32_t Version; // Version, currently always 0.
1300 /// uint32_t BitcodeOffset; // Offset to traditional bitcode file.
1301 /// uint32_t BitcodeSize; // Size of traditional bitcode file.
1302 /// uint32_t CPUType; // CPU specifier.
1303 /// … potentially more later …
1304 /// };
1305 enum {
1306 DarwinBCSizeFieldOffset = 34, // Offset to bitcode_size.
1307 DarwinBCHeaderSize = 5
4
1308 };
1309

and

1388 /// WriteBitcodeToStream - Write the specified module to the specified output
1389 /// stream.
1390 void llvm::WriteBitcodeToStream(const Module *M, BitstreamWriter &Stream) {
1391 // If this is darwin, emit a file header and trailer if needed.
1392 bool isDarwin = M->getTargetTriple().find("-darwin") != std::string::npos;
1393 if (isDarwin)
1394 EmitDarwinBCHeader(Stream, M->getTargetTriple());
1395

Anyway, I should be able to figure it out from here - thanks again.

...which I did - though I had to ensure I had the target triple set in the module just right: "x86_64-apple-darwin"

Originally I tried "i686-apple-darwin9" as this is what gcc reports, and what gets burnt into config.h as:
#define LLVM_HOSTTRIPLE "i686-apple-darwin9.6.0"

However, it dawned on me that I should really be asserting x86_64, and indeed this works nicely if it is also asserted on the ld command line.

-- lwe

D'oh. Spoke too soon (sort of).

I have apparently managed to create a bit-code file that will happily be ingested by ld. The dylib that comes out has my symbol in it (according to nm).
However, my call to dlsym, which used to work happily with my earlier pipeline (using default arch and llc-as-ld to get the dylib) now returns NULL.

Is there anything special about symbol names generated by ld with x86_64 arch?

-- lwe

...got it. Architecture mismatch with the dylib - nothing to do with LLVM.

i386 works in the module target triple and the -arch arg of ld.
Don't know why (yet) x86_64 doesn't work. I think perhaps I end up only running the 32 bit binary even though I'm building 32/64 Universal, and the single-architecture x86_64 dylib isn't loading.

Anyway, I've got enough working to have the confidence to press on with my LLVM project, and I think its gonna be fun.
It's early days indeed, but I'm already impressed with my LLVM experience. That includes the general quality of the coding (and commenting) style in the LLVM sources I've looked at. I'm not a big fan of C++, but I find _this_ C++ to be quite 'tasteful' (I've seen an awful lot of the other). Keep up the great work guys.

-- lwe