llvmc - Compiler Driver - Status Update & Issues

Folks,

As of the writing of this note, the llvmc tool is enabled for build on
the CVS head. I'm encouraging you to try it out, provide some feedback,
and help with the issues below.

llvmc is now able to correctly link a pure bytecode version of any
Stacker program. This includes translation with stkrc, optimization with
opt and linking with llvm-link. It is also able to find Stacker's
runtime library automatically using the "dependent libraries" feature of
the VMCore IR. For example:

bash-2.05$ llvmc fibonacci.st -o fibo -f -v
stkrc -s 2048 fibonacci.st -o /tmp/llvm_1zyB4x/fibonacci.st.trans -f
opt /tmp/llvm_1zyB4x/fibonacci.st.trans -o /tmp/llvm_1zyB4x/fibonacci.st.opt -simplifycfg -instcombine -mem2reg -f
llvm-link /tmp/llvm_1zyB4x/fibonacci.st.opt /proj/work/llvm/cfrontend/install/bytecode-libs/stkr_runtime.bc -v -f -o fibo
Loading '/tmp/llvm_1zyB4x/fibonacci.st.opt'
Loading '/proj/work/llvm/cfrontend/install/bytecode-libs/stkr_runtime.bc'
Linking in '/proj/work/llvm/cfrontend/install/bytecode-libs/stkr_runtime.bc'
Writing bytecode...

Note that without any -L option to llvmc or any mention of
"stkr_runtime.bc", llvmc was able to find and link the Stacker runtime
library.

llvmc is now also *mostly* compatible with GCC's compiler driver. For
example, here's an attempt to use llvmc as the CXX variable when
building the CompilerDriver.cpp file (part of llvmc):

bash-2.05b$ pwd
/proj/work/llvm/build/tools/llvmc
bash-2.05b$ gmake VERB= CXX=llvmc
Compiling CompilerDriver.cpp
/proj/work/llvm/build/mklib --tag=disable-shared --silent --tag=CXX \
  --mode=compile llvmc -c -I/proj/work/llvm/build/tools/llvmc \
  -I/proj/work/llvm/build/../llvm/tools/llvmc -I/proj/work/llvm/build/include \
  -I/proj/work/llvm/build/../llvm/include -I../../include \
  -I/proj/work/llvm/build/../llvm/include -D_GNU_SOURCE -D__STDC_LIMIT_MACROS \
  -DATTR_DEPRECATED='__attribute__ ((deprecated))' -Wall -W -Wwrite-strings \
  -Wno-unused -g -D_DEBUG \
  /proj/work/llvm/build/../llvm/tools/llvmc/CompilerDriver.cpp -o \
  /proj/work/llvm/build/tools/llvmc/Debug/CompilerDriver.lo
Unknown command line argument '-DATTR_DEPRECATED=__attribute__ ((deprecated))'. Try: 'llvmc --help'
Unknown command line argument '-fPIC'. Try: 'llvmc --help'
gmake: *** [/proj/work/llvm/build/tools/llvmc/Debug/CompilerDriver.lo] Error 1

Unfortunately, -fPIC is out of scope for llvmc and the CommandLine.h
parser for strings doesn't like the = or space in the -D option.

Some unresolved issues that I would appreciate feedback on:

1. How important is 100% compatibility with GCC? (my take: "not very",
    but we should be "close" where it makes sense).
2. Can the -D problem shown above be solved without modification to
    the CommandLine library? Note that other -D options were accepted by
    llvmc.
3. Should -fXXX options just be passed through to compiler tools? Or,
    should they be accepted and ignored, or should they be reported as
    errors as shown above?
4. What exactly should happen for native code linking? Should gccld be
    used in conjunction with llc? Should I use a native linker? Should
    linking be specifiable in the configuration? If so, how? Its not
    language specific.
5. What do we do if a "dependent library" specifies a name and the first
    thing found is a native library but the llvmc command line isn't
    building a native executable? Should the dependent library just be
    passed through to llvm-link so that the interpreter/jit can
    dynamically load it at run time? How can we ensure that there are
    no unresolved symbols in this case? Should a native link also
    produce a bytecode "stub" that contains declarations of the things
    in the native library?

All thoughts appreciated.

Thanks,

Reid.

llvmc is now able to correctly link a pure bytecode version of any
Stacker program. This includes translation with stkrc, optimization with
opt and linking with llvm-link. It is also able to find Stacker's
runtime library automatically using the "dependent libraries" feature of
the VMCore IR. For example:

Cool! Note that we are still missing an important feature: the linker
does not correctly merge the target triple or shared lib list. If you
wanted to implement it, it would be straight forward (see
lib/VMCore/Linker.cpp). Basically if you link a.bc with b.bc, the
resulting bytecode file should depend on the union of shared libs that
a.bc and b.bc require.

Unfortunately, -fPIC is out of scope for llvmc and the CommandLine.h
parser for strings doesn't like the = or space in the -D option.

Some unresolved issues that I would appreciate feedback on:

1. How important is 100% compatibility with GCC? (my take: "not very",
    but we should be "close" where it makes sense).

I think that it's fairly important to be compatible for the widely used
options. In particular, the -D* -W* and -f* options should be passed
through.

2. Can the -D problem shown above be solved without modification to
    the CommandLine library? Note that other -D options were accepted by
    llvmc.

Nope, I think CommandLine should be fixed to understand quotes. Doing
this globally for ' and " should be ok.

3. Should -fXXX options just be passed through to compiler tools? Or,
    should they be accepted and ignored, or should they be reported as
    errors as shown above?

Ideally this should be configurable by the language specific config file,
but for now, just passing -f* through like -W* and friends should be fine.

4. What exactly should happen for native code linking? Should gccld be
    used in conjunction with llc? Should I use a native linker? Should
    linking be specifiable in the configuration? If so, how? Its not
    language specific.

All of the above. :slight_smile: We need to use gccld to link in LLVM libraries and
do post-link opt. If going to native code, we need to use LLC, then the
system assembler, then the system linker. Something to think about is
that some targets will eventually support direct .o file emission. :slight_smile:

5. What do we do if a "dependent library" specifies a name and the first
    thing found is a native library but the llvmc command line isn't
    building a native executable? Should the dependent library just be
    passed through to llvm-link so that the interpreter/jit can
    dynamically load it at run time?

Yes.

    How can we ensure that there are
    no unresolved symbols in this case? Should a native link also
    produce a bytecode "stub" that contains declarations of the things
    in the native library?

I don't think there is a good way to do this. Until we generate native
code (either JIT or LLC), we won't know about unresolved symbols.
Eventually we can come up with really complex schemes at LLVM link time
that could 'nm' native libraries that will be linked in, but I don't think
it's worth it yet.

-Chris