[fwd] Re: [LLVMdev] Hash Bang

Karl, I think you meant to cc the llvmdev list on this.

Thank you for a more detailed explanation, it's much clearer to me now.

I agree that making the execution of .bc files more transparent would
make it more useable as a stand-alone binary format on Unix-like systems
and adding programmable support to changing the #! line would prevent
much of user error involved in modifying the run line.

One issue is that the limit of 256 chars for the run line might not be
enough due to libraries. For example, let's pretend we have a program
foo.c which uses several libraries and we compile it as follows:

% llvm-gcc foo.c -o foo

This will produce the bytecode file 'foo.bc' and a shell script 'foo'
which you can use to run the program via the JIT (lli). The way LLI
loads external libraries is via the -load=[full path to library] flags,
and theoretically, there isn't a limit as to how many libraries a
program might use, so a hard-coded limit in the run line would certainly
be problematic.

The other issue is that it seems such libraries aren't specified in the
"deplibs" part of the Module, which perhaps it should be -- I just tried
with a simple example which uses sin() and the bytecode file did not
have a dependence on the math library, but the shell script had a
-load=/usr/lib/libm.so correctly added. If the deplibs field were
updated correctly, perhaps LLI could automatically search the standard
system paths for such libraries.

Anyone else have any thoughts on this?

----- Forwarded message from Karl Magdsick <kmagnum@gmail.com> -----

Karl, I think you meant to cc the llvmdev list on this.

Thank you for a more detailed explanation, it's much clearer to me now.

That does make more sense to me too.

I agree that making the execution of .bc files more transparent would
make it more useable as a stand-alone binary format on Unix-like systems
and adding programmable support to changing the #! line would prevent
much of user error involved in modifying the run line.

Sure, I can see this as useful on unix systems.

One issue is that the limit of 256 chars for the run line might not be
enough due to libraries. For example, let's pretend we have a program
foo.c which uses several libraries and we compile it as follows:

% llvm-gcc foo.c -o foo

This will produce the bytecode file 'foo.bc' and a shell script 'foo'
which you can use to run the program via the JIT (lli). The way LLI
loads external libraries is via the -load=[full path to library] flags,
and theoretically, there isn't a limit as to how many libraries a
program might use, so a hard-coded limit in the run line would certainly
be problematic.

I agree with Misha, this is an issue.

Another issue that I have is that this is a very unix-centric solution. I guess that there isn't any good solution to this though.

In principle, making the bc reader read your specially annotated .bc files shouldn't be an issue: it currently looks for a magic number for llvm/llvc files, and could check for a #! line as well. Instead of hard coding a fixed 256 byte offset, I don't see any reason the .bc reader couldn't skip ahead until it passes the first newline in the file. After the newline(s), it would start checking for llvm bc form.

Does this make sense? Given this, you could modify gccld (only) to emit .bc files with the #! lines (it has the path to lli to use).

-Chris

[snip]

> This will produce the bytecode file 'foo.bc' and a shell script 'foo'
> which you can use to run the program via the JIT (lli). The way LLI
> loads external libraries is via the -load=[full path to library] flags,
> and theoretically, there isn't a limit as to how many libraries a
> program might use, so a hard-coded limit in the run line would certainly
> be problematic.

I agree with Misha, this is an issue.

Well, it sounds like maybe I should first look into listing the
required libraries
in the bytecode, as this would also allow misc binaries to work properly, and
would make for simpler and more elegant shell scripts emitted by llvm-gcc.

Does this make sense?

Another issue that I have is that this is a very unix-centric solution. I
guess that there isn't any good solution to this though.

Yea, I wish knew of a more clean solution.

In principle, making the bc reader read your specially annotated .bc files
shouldn't be an issue: it currently looks for a magic number for llvm/llvc
files, and could check for a #! line as well. Instead of hard coding a
fixed 256 byte offset, I don't see any reason the .bc reader couldn't skip
ahead until it passes the first newline in the file. After the
newline(s), it would start checking for llvm bc form.

IMHO, misc binaries are the most elegant solution I've seen.
I'm not aware of a way to specify misc binary magic number rules
without constant offsets, which would lead to an infinite number of
rules if there are an infinite number of legal offsets for the magic
number. I'd rather not break the more elegant solution if at all
possible.

So, does it make sense for me to first look into getting all of the
dependencies into the bytecode file?

-Karl