[IR question] Switching on pointers

Hi.

First of all, some context. I'm trying to implement a new functionality in an LLVM-based compiler and I need to take various actions based on the value of a given pointer, the possible values being the addresses of various global constants. I tried to use a `switch` instruction but I encountered several problems.

The "ideal switch" I'd like to have would look something like this (with way more cases):

     ; ModuleID = 'module'
     source_filename = "module"

     @var0 = global i32 0
     @var1 = global i32 0

     define i32 @func(i32*) {
     entry:
       switch i32* %0, label %unreachable [
         i32* @var0, label %var0
         i32* @var1, label %var1
       ]

     var0: ; preds = %entry
       br label %post

     var1: ; preds = %entry
       br label %post

     unreachable: ; preds = %entry
       unreachable

     post: ; preds = %var1, %var0
       %1 = phi i32 [ 0, %var0 ], [ 1, %var1 ]
       ret i32 %1
     }

This example is impossible because a `switch` cannot have pointer operands. So I tried with `ptrtoint`, turning the `switch` into this:

       %1 = ptrtoint i32* %0 to i64
       switch i64 %1, label %unreachable [
         i64 ptrtoint (i32* @var0 to i64), label %var0
         i64 ptrtoint (i32* @var1 to i64), label %var1
       ]

I'm a bit baffled by that one. According to the documentation the address of a global value is a constant, yet this `switch` is impossible because the result of the `ptrtoint` isn't a constant integer.

At that point, I don't really know how to proceed. My questions are:

- Why is `switch` restricted to integers?
- What is the result of a constant `ptrtoint` and why isn't it a constant integer?
- Is there a way to switch on pointers despite these restrictions?

Thanks.

Just a guess. Maybe you can write C code mimic what you want to achieve, then compile it to LLVM IR.
See what LLVM IR generated by FE.

Regards,
chenwj

Just a guess. Maybe you can write C code mimic what you want to achieve,
then compile it to LLVM IR.
See what LLVM IR generated by FE.

Probably won't work, since both the type of x in switch(x) and case-labels
must be integral types (not pointers).

Regards,
chenwj

2017-05-16 0:29 GMT+08:00 Benoit Vey via llvm-dev <llvm-dev@lists.llvm.org
>:

Hi.

First of all, some context. I'm trying to implement a new functionality
in an LLVM-based compiler and I need to take various actions based on the
value of a given pointer, the possible values being the addresses of
various global constants. I tried to use a `switch` instruction but I
encountered several problems.

The "ideal switch" I'd like to have would look something like this (with
way more cases):

    ; ModuleID = 'module'
    source_filename = "module"

    @var0 = global i32 0
    @var1 = global i32 0

    define i32 @func(i32*) {
    entry:
      switch i32* %0, label %unreachable [
        i32* @var0, label %var0
        i32* @var1, label %var1
      ]

    var0: ; preds = %entry
      br label %post

    var1: ; preds = %entry
      br label %post

    unreachable: ; preds = %entry
      unreachable

    post: ; preds = %var1,
%var0
      %1 = phi i32 [ 0, %var0 ], [ 1, %var1 ]
      ret i32 %1
    }

This example is impossible because a `switch` cannot have pointer
operands. So I tried with `ptrtoint`, turning the `switch` into this:

      %1 = ptrtoint i32* %0 to i64
      switch i64 %1, label %unreachable [
        i64 ptrtoint (i32* @var0 to i64), label %var0
        i64 ptrtoint (i32* @var1 to i64), label %var1
      ]

I'm a bit baffled by that one. According to the documentation the address
of a global value is a constant, yet this `switch` is impossible because
the result of the `ptrtoint` isn't a constant integer.

At that point, I don't really know how to proceed. My questions are:

- Why is `switch` restricted to integers?

Same reason C only supports it?

- What is the result of a constant `ptrtoint` and why isn't it a constant

integer?

Well, a POINTER isn't a true constant either, so I'm not entirely sure your
original value is indeed, truly, a constant. But perhaps splitting hairs.

- Is there a way to switch on pointers despite these restrictions?

I'm not sure what you are ACTUALLY trying to solve. The subject seems a bit
XY-question - you want to achieve X, you think Y is the way to do that, so
you ask how to do Y - if you explain "I want to do X, how should I do
that?", maybe someone can come up with a good answer?

Hi.

First of all, some context. I'm trying to implement a new functionality in an LLVM-based compiler and I need to take various actions based on the value of a given pointer, the possible values being the addresses of various global constants. I tried to use a `switch` instruction but I encountered several problems.

The "ideal switch" I'd like to have would look something like this (with way more cases):

   ; ModuleID = 'module'
   source_filename = "module"

   @var0 = global i32 0
   @var1 = global i32 0

   define i32 @func(i32*) {
   entry:
     switch i32* %0, label %unreachable [
       i32* @var0, label %var0
       i32* @var1, label %var1
     ]

   var0: ; preds = %entry
     br label %post

   var1: ; preds = %entry
     br label %post

   unreachable: ; preds = %entry
     unreachable

   post: ; preds = %var1, %var0
     %1 = phi i32 [ 0, %var0 ], [ 1, %var1 ]
     ret i32 %1
   }

This example is impossible because a `switch` cannot have pointer operands. So I tried with `ptrtoint`, turning the `switch` into this:

     %1 = ptrtoint i32* %0 to i64
     switch i64 %1, label %unreachable [
       i64 ptrtoint (i32* @var0 to i64), label %var0
       i64 ptrtoint (i32* @var1 to i64), label %var1
     ]

I'm a bit baffled by that one. According to the documentation the address of a global value is a constant, yet this `switch` is impossible because the result of the `ptrtoint` isn't a constant integer.

I think the confusion stems from there being multiple kinds of constants. The switch instruction wants to know the value of the constant at compile-time but while the address of a global is constant, it isn't known until link-time.

Yes, like Daniel says, and I implied by saying that "address of a variable
isn't a true constant" (or words to that effect). I suspect you wanted to
post this to the list, so I copied the list in again, rather than keep this
as a private conversation between you and me.

Aside from the obvious answer of "the switch needs compile time constants"
(what I call "true constants" - they are constants that the LLVM compiler
knows the actual value of, not constants as in "won't change during the
execution of the program", such as variable and function addresses), it
should be noted that unless the compiler & linker can be certain that for
example the var0 and var1 address are "next to each other", this sort of
construct will not benefit from any "switch" optimisation, it will be
exactly like a switch with large, non-close values: an if/else if/else ...
chain - probably not even able to do any cleverness of building a tree of
which comes first [in a generic case of this, at the very least]. So if you
have a case where you need to resolve a language construct where you can
switch on a reference to a variable referring to a particular variable,
then you probably have to make it into a compare + branch - or perhaps have
a table and iterate over the choices. Even if LLVM knew how to do this, it
wouldn't be possible to produce anything noticeably better [at least I
can't think of anything - if anyone has some clever ways to solve this kind
of problem, I would appreciate a comment...]

I still would like to hear from Benoit as to what the goal is here. I'm
only guessing that it's a language where you can do something like "ref =
var0; ... switch(ref) case var0: ... ; case var1: ... " or some such. Not
that I know for sure what language that is.

Thanks everyone for your answers.

Mats, my goal is do an alternate implementation of the subtyping test in the Pony compiler. I'll try to keep the explanation short to avoid cluttering the mail with irrelevant information.

Every object in Pony has a type descriptor, which is used in the subtyping test to determine the real type of an object. The current algorithm is suboptimal and we're trying out various alternatives. The full details are on this github pull request: Use a constant-time algorithm for trait subtyping tests by Praetonus · Pull Request #1752 · ponylang/ponyc · GitHub

One of the alternatives, described by Sylvan Clebsch in the 6th message on the PR, would involve looking at whether the type descriptor of an object with an abstract type is the same as one of the subtypes of the abstract type being tested. Here's an equivalent C code from the github PR:

     bool SomeTraitTypeName__istype(pony_type_t* desc)
     {
       switch((uintptr_t)desc)
       {
         case 0x878AB00:
         case 0x878AB08:
         case 0x878AB16:
         case 0x878AC00:
         case 0x878AC08:
         case 0x878AD00:
           return true;

         default: {}
       }

       return false;
     }

Since this can be implemented with either a switch or an if/else sequence, I think I'll simply do the if/else instead.

mats petersson <mats@planetcatfish.com> a écrit :

Hi Benoit,

This is slightly off topic for the list, but have you checked to see
if "Fast Subtype Checking in the HotSpot JVM" by Cliff Click and John
Rose applies to your use case?

-- Sanjoy

I would consider putting all the data tables in a new section in the compiled binary. After runtime linking, you can then enumerate through all the tables in this section and sort them. You can then write a single istype function that performs a binary search at runtime.