LLVM capability question.

I’m losing my sanity, so I thought I’d try and generate an LLVM target for the Glasgow Haskell Compiler (GHC). In talking to some of the people in the GHC mailing list some issues have come up that I can’t find a ready answer to. (Others came up that I could, so I don’t feel quite as stupid or helpless as I could.)

  1. Is there any way to hint that a global pointer variable should be put in a register when further translated to some native form of code? This is specifically necessary, apparently, for the current stack and heap pointers in GHC for speed reasons.
  2. This I’m just going to quote: “With (almost) every chunk of code, we want to associate a smal chunk of data (the “info table”) which contains information used by the garbage collector and other parts of the run time system. We want to use only one pointer to point to both of those things, and we don’t want to waste time with any additional indirections, so we make the pointer point between the data chunk and the code chunk.” I’ve asked for clarification on this from the original author, but I suspect it’s something like having a pointer to code base that uses negative offsets to associated data (like some C++ implementations use for the vtable).
  3. What is the actual support for tail calls? The docs are pretty vague on this point – the “tail” keyword seems to “enable” tail call optimisation. I’m not sure what that means.
  4. GHC relies an awful lot on having many, many, many “little stacks”. Is it possible to have lots of tiny stacks in LLVM’s architecture, to switch between them easily, to check when they overflow and to automatically grow them when they exceed their bounds?

Hi,

Michael T. Richter wrote:

4. GHC relies an awful lot on having many, many, many "little stacks".
Is it possible to have lots of tiny stacks in LLVM's architecture, to
switch between them easily, to check when they overflow and to
automatically grow them when they exceed their bounds?

I was also wondering about this from the point of view of using channels
to communicate between tasks, where only one task runs at a time and it
runs until it blocks. (Yes, I'm being influenced by CSP and Bell Labs'
Squeak, Newsqueak, Alef, Limbo, etc.)

Cheers,

Ralph.

Hello Michael,

I'm losing my sanity, so I thought I'd try and generate an LLVM target
for the Glasgow Haskell Compiler (GHC). In talking to some of the
people in the GHC mailing list some issues have come up that I can't
find a ready answer to. (Others came up that I could, so I don't feel
quite as stupid or helpless as I could.)

     1. Is there any way to hint that a global pointer variable should
        be put in a register when further translated to some native
        form of code? This is specifically necessary, apparently, for
        the current stack and heap pointers in GHC for speed reasons.

I don't think so.

     1. This I'm just going to quote: "With (almost) every chunk of
        code, we want to associate a smal chunk of data (the "info
        table") which contains information used by the garbage
        collector and other parts of the run time system. We want to
        use only one pointer to point to both of those things, and we
        don't want to waste time with any additional indirections, so
        we make the pointer point between the data chunk and the code
        chunk." I've asked for clarification on this from the
        original author, but I suspect it's something like having a
        pointer to code base that uses negative offsets to associated
        data (like some C++ implementations use for the vtable).

Okay. I'm not sure what the question is. Certainly LLVM can handle
vtable type things. If this is only for garbage collection, you might
want to consider implementing the garbage collection interface in LLVM.
Please see: http://llvm.org/docs/GarbageCollection.html

     1. What is the actual support for tail calls? The docs are
        pretty vague on this point -- the "tail" keyword seems to
        "enable" tail call optimisation. I'm not sure what that
        means.

Function call instructions can have a "tail" option which enables tail
call optimization. If you write:

    tail call void %foo ()

then you will get a tail call. Without the "tail" keyword, you won't.

     1. GHC relies an awful lot on having many, many, many "little
        stacks". Is it possible to have lots of tiny stacks in LLVM's
        architecture, to switch between them easily, to check when
        they overflow and to automatically grow them when they exceed
        their bounds?

I don't think so, but things could be done in the code generator to make
that happen. I believe the code generators assume there's one execution
stack. However, I'm not the best person to answer this question so
perhaps someone else will chime in.

Reid.

I'm losing my sanity, so I thought I'd try and generate an LLVM target
for the Glasgow Haskell Compiler (GHC). In talking to some of the

Ah, so you had to lose your sanity before considering llvm... I see. :slight_smile:
:slight_smile:

people in the GHC mailing list some issues have come up that I can't
find a ready answer to. (Others came up that I could, so I don't feel
quite as stupid or helpless as I could.)

Ok,

    1. Is there any way to hint that a global pointer variable should
       be put in a register when further translated to some native form
       of code? This is specifically necessary, apparently, for the
       current stack and heap pointers in GHC for speed reasons.

llvm-gcc supports the GCC 'asm("%ebx")' syntax for global variables. Note that, like GCC, LLVM only makes guarantees about the contents of the register during inline asm calls, so this may not be sufficient for you.

If this isn't sufficient, it may make sense to do a simple extension to LLVM to provide a capability to do this sort of thing. However, you should do some performance analysis to determine if this is needed, LLVM produces very different code than GCC, so you may not need to do this.

    2. This I'm just going to quote: "With (almost) every chunk of
       code, we want to associate a smal chunk of data (the "info
       table") which contains information used by the garbage collector
       and other parts of the run time system. We want to use only one
       pointer to point to both of those things, and we don't want to
       waste time with any additional indirections, so we make the
       pointer point between the data chunk and the code chunk." I've
       asked for clarification on this from the original author, but I
       suspect it's something like having a pointer to code base that
       uses negative offsets to associated data (like some C++
       implementations use for the vtable).

C++ v-tables aren't typically implemented this way, AFAIK. However, it should be easy to add an intrinsic to capture this and do the right thing.

    3. What is the actual support for tail calls? The docs are pretty
       vague on this point -- the "tail" keyword seems to "enable" tail
       call optimisation. I'm not sure what that means.

When a call is marked 'tail', the compiler is allowed to assume that the callee does not access the caller's stack. Thus, if marked 'tail', in the tail position (right before a return), and marked 'fastcc' (so arguments can be passed normal order instead of reverse order), the code generator can deallocate the stack before calling the callee.

AFAIK, the X86 backend is currently the only one that supports this, and you have to pass a special option to get it to do so. It should be straight-forward to extend other targets to support this, but there has been little demand so far.

    4. GHC relies an awful lot on having many, many, many "little
       stacks". Is it possible to have lots of tiny stacks in LLVM's
       architecture, to switch between them easily, to check when they
       overflow and to automatically grow them when they exceed their
       bounds?

It depends on what you mean. There are multiple ways to achieve this. The simplest way is to use CPS:
http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt

-Chris