Is LLVM appropriate for implementing a shell interpreter?

Hi devs,

We are implementing a library that interprets shell scripts so that
other programs could efficiently talk to bash. We'd like to hear your
advice on whether LLVM is appropriate for us. Here are our considerations:

In most cases our library will interpret each script just once. Our
current approach is using a manual implementation based on ANTLR and
C++, so actually we are executing the scripts while interpreting. If we
turn to LLVM, we need to first compile it into LLVM IR, then into native
code. Our guess is this may be slower than our current approach. Is that
true?

Anyway, we do have several scripts that need to be sourced and reused
while interpreting others. We guess this is where LLVM could help. LLVM
optimized code for those scripts should run faster than our manual
implementation. So the overall performance could be improved.

Could you please point out if we are wrong? Thanks.

Mu Qiao wrote:

Hi devs,

We are implementing a library that interprets shell scripts so that
other programs could efficiently talk to bash. We'd like to hear your
advice on whether LLVM is appropriate for us. Here are our considerations:

In most cases our library will interpret each script just once. Our
current approach is using a manual implementation based on ANTLR and
C++, so actually we are executing the scripts while interpreting. If we
turn to LLVM, we need to first compile it into LLVM IR, then into native
code. Our guess is this may be slower than our current approach. Is that
true?

Anyway, we do have several scripts that need to be sourced and reused
while interpreting others. We guess this is where LLVM could help. LLVM
optimized code for those scripts should run faster than our manual
implementation. So the overall performance could be improved.

Could you please point out if we are wrong? Thanks.

I'm currently implementing such a thing (interactive shell / compiled
scripts (only the former of which is currently being implemented)).

LLVM apparently has one problem regarding this: Its context caches all
constants ever created and doesn't free them, until the LLVMContext object
is free'ed itself.

So if your shell for example is connected to a pipe, accepting generated
scripts or something in possibly fast succession, you will have problems
with inputs such as:

    print 0 # 0 is cached and never free'ed
    print 1 # 1 is cached and never free'ed
    ...

We haven't tried to tackle this problem yet. But we probably need to.

Thank you for answering.
In our case we do not focus on interactive things so I think the problem
you mentioned is acceptable by us. But we're really not sure whether
LLVM could improve the overall performance compared to a manual
implementation as I said in the last mail.