Addressing TableGen's error "Ran out of lanemask bits" in order to use more than 32 subregisters per register

     In my TableGen back end description I need to use more than 32 (e.g., 128, 1024, etc) subregisters per register for my research SIMD processor. I have used so far with success 32 subregisters.

     However, when using 128 subregisters when I now give the command:
       llvm-tblgen -gen-register-info
      I get an error message "error:Ran out of lanemask bits to represent subregister sub_16_033".

     To handle this limitation, I started editing the files where this error comes from:
     More exactly, the error comes from the fact the member LaneMask of the classes CodeGenSubRegIndex and CodeGenRegister is unsigned (i.e., 32 bits). So for every lane/subregister we require a bit from the type LaneMask.
     I plan to use type long (or even type int1024_t from the boost library, header cpp_int.hpp) for LaneMask and change accordingly the methods handing the type.

     Is there are any limitation I am not aware of (maybe in LLVMV's register allocator) that would prevent me from using more than 32 lanes/subregisters?

   Thank you very much,

There is no known limitation. I chose uint32_t out of concern for compiletime. Going up for uint64_t should be no problem, I'd be more concerned about bigger types; hopefully all code properly uses the LaneBitmask type instead of plain unsigned, you may need a few fixes in that area.
(For history: We had a scheme in the past where the liveness tracking mapped all lanes after lane 31 to the bit 32, however that turned out to need special code in some places that turned out to be a constant source of bugs that typically only happened in big and hard to debug inputs so we moved away from this scheme).

- Matthias


I’ve managed to patch the various files from the back end related to lanemask - now I have 1024-bit long lanemask.
But now I get the following error when giving make llc:
<<error:unhandled vector type width in intrinsic!>>
This error comes from this file, comes from the fact there is no IIT_V128 (nor IIT_V256), and they is a switch case using them in method static void EncodeFixedType(Record *R, std::vector &ArgCodes, std::vector &Sig).

Is there any reason these enum IIT_Info ( IIT_V128, IIT_V256) are not added in file /IntrinsicEmitter.cpp?

Thank you,


     I managed to use SIMD units with more than 32 lanes (32 subregisters per vector register) in TableGen, llc and opt. For example, I use SIMD instructions with types v128i16 and v512i16.

     An important questions I have is if it is OK to add the types IIT_V128 = 37, IIT_V256 = 38 like I did below:
         enum IIT_Info {
           IIT_V2 = 9,
           IIT_V4 = 10,
           IIT_V8 = 11,
           IIT_V16 = 12,
           IIT_V32 = 13,
           IIT_V64 = 16,
           IIT_V1 = 28,
           IIT_VEC_OF_PTRS_TO_ELT = 33,
           IIT_V512 = 35,
           IIT_V1024 = 36,

           /* Alex: added these new values. Note that these IIT_* that I add below must be defined in also */
           IIT_V128 = 37,
           IIT_V256 = 38

     I ask because enum IIT_Info has some values that are not consecutive for vector types for intrinsics (used e.g. in include/llvm/IR/Intriniscs*.td).
     Although not important, I wonder why do I still need to define them again (since these values are basically already defined in ?

     So, I managed to get the code compiled. I had issues because I did not synchronize the following code:
       - enum IIT_Info defined in files llvm/utils/TableGen/IntrinsicEmitter.cpp and llvm/lib/IR/Function.cpp;
       - enum SympleValueType defined in files llvm/include/llvm/CodeGen/ and llvm/include/llvm/CodeGen/MachineValueType.h .
     I was getting errors because of this out-of-sync like:
       - "error:unhandled vector type width in intrinsic!", "error:unhandled MVT in intrinsic!"
       - "Not a vector MVT!", "getSizeInBits called on extended MVT."

   Best regards,

Hello Alex,

I am very interested in your change to support more than 32bit lanemask. I am working on a new llvm backend target which may also needs such kind of support.

I am not sure whether it is convenient to share the change with me? So I could have some try.



Would uint64_t be sufficient for you?


Hi Krzysztof,

uint64_t is not enough for me. seems that 128bit is enough for me.

what I am doing is like I need to define the register set for 16 working threads in a warp in nvidia terminology. I often call the ‘warp size’ as simd-width.

Like AMDGPU, the arch support scalar/vector register. the vector width is 16 if the simd-width/warp-size is 16.

But the scalar/vector register reside in only one register file, so they need to alias each other. That is a vector register can also be used as several scalar registers.

What I choose to do is define scalar ‘short’ type register, and a vector register for QWord is composed of 16(simd-width)*4(size in unit of short) = 64 uniform short register.

So, for normal usage under simd-width of 16, 64bit lane mask is enough.

The problem is the ‘store’ or ‘load’ operation can support up to SIMD16 of 4 DWord read/write. And the arch requires the four element register in consecutive registers.
So I have to define a registerTuple that is composed of 16(simd-width) * 4(element) * 2(size in units of short) = 128 uniform short register. That means 128 bits lanemask.

Some previous discussion threads if you are interested:



     I come back to this older thread.

     As I've said before, I managed to patch the various files from the back end related to lanemask in order to support at most 1024 vector lanes. For this I am using a 1024-bit long lanemask of type uint1024_t from boost::multiprecision, instead of uint32_t. For this I changed the following LLVM source files:
     I plan to contribute patches for these changes to the llvm-commits mailing list.
     These changes were tested by me for more than 6 months with llc on various benchmarks - things seem to work well.

     Besides these changes I added new vector types (basically all vector types that were not already present in LLVM, from 32 lanes to 1024, for types i8, i16, i32, i64 and f16/32/64, etc - examples of types that I needed are v128i1, v128i16, also v1024f64). The files I changed are:
     Please let me know if you want to commit these changes also - they are rather complex in the sense there are a lot of small dependencies for these types.

   Best regards,

You seem to be using old LLVM sources---changing this many files for supporting a different width LaneBitmask is no longer necessary.

Also, boost is not a current requirement for building LLVM and it's unlikely that requiring it for that purpose alone is justified.


     Is it part of the LLVM developer policy NOT to use the Boost library (for the TableGen and llc tools, etc)?
     I see some people use sometimes Boost with LLVM: Compiling LLVM and Clang with Boost (enabled RTTI) - CAESR, also .

   Thank you,

Hi Alex,
No, there is no such policy. What I meant is that if you wanted to commit your solution that uses Boost to, it would make the upstream LLVM sources depend on Boost, which would likely be met with objections.

Interestingly, someone was asking about a related issue in LaneBitmask, and wrote a patch that implements arbitrary bit width (beyond the current 32/64 bits). It hasn't been tested other than doing "make check-llvm", and it's not committed due to the potential compilation overhead.