[RFC] bitinsert and bitextract instructions for LLVM IR #2

Authors: Pedro Vicente (@pedroMVicente), Guilherme Lopes (@guilhermeglopess)
This post is continuation of the discussion for the bitextract and bitinsert instructions mentioned in [RFC] Add a New Byte Type to LLVM IR.

Summary

LLVM IR currently has no representation for extracting or inserting contiguous ranges of bits (efficiently) within the new Byte Type. For integer types, these operations are compiled to multiple instructions (shifts, masks, etc). However, these are not allowed for Byte Types. We propose two new instructions, bitextract and bitinsert, to address this gap in Byte types.

Motivation

  • Bitfield access: C and C++ bitfields require extracting and inserting contiguous ranges of bits from memory. Right now, bitwise integer operations are used to access bitfields, but it is desired to change these to bytes so the value of uninitialized memory can be switched to poison instead of undef. bitextract allows to extract a bitfield range directly from a byte value, and bitinsert allows modifying part of a byte. Without these instructions, bitfield access on Byte Type values is not possible to represent (technically, these could be represented by bitcasting to a vector and using shufflevector, but this is highly convoluted and inappropriate for a canonical representation).

  • Load widening: GVN and other optimizers merge together loads from multiple nearby locations, which can spread poison incorrectly. The original RFC fixed this by issuing a byte load and then trunc+lshr+cast to extract the individual values. However, these operations expect integer types, not byte.

Proposal

Bitextract

Syntax:

<result> = bitextract <ty>, <tx> <source>, i32 <offset>

Overview:

This instruction extracts a value from a byte.

Arguments:

The ‘bitextract’ instruction takes any first-class type ty (the return type), a Byte value source, and an offset.

Semantics:

The ‘bitextract’ instruction returns a value of the specified type. The returned value is first extracted from the source, starting at the bit specified by the offset, and then bitcasted to the return type. If the range offset+ty.bitwidth exceeds the source width, it returns poison.

Example:

%result = bitextract i8, b32 %src, i32 24 ; Extract the last 8 bits from %src and return an 8-bit integer

Bitinsert

Syntax:

<result> = bitinsert <tx> <base>, <ty> <val>, i32 <offset>

Overview:

This instruction inserts a value into a specific position within a byte.

Arguments:

The first argument (base) of bitinsert must be of a byte type. The second operand (val) must be an arbitrary type.

Semantics:

The returned value is of the same byte type as the first argument. It returns the first argument where the bits in the range [offset, offset + ty.bitlength - 1] have been replaced with val. If the range (offset + ty.bitwidth) is greater than the bitwidth of the first argument, it returns poison.

Example:

%result = bitinsert b32 %x, i8 %y, i32 3 ; Inserts the %y bits into %x with an offset of 3

Why not extend these for integers?

The possibility of extending these instructions to integer types has been considered, but it would represent a significant change to LLVM’s canonical form. For instance, the purpose of the trunc instruction would become ambiguous if bitextract was valid for all integers.

Why Instructions over Intrinsics?

Byte type is a first-class citizen, therefore basic manipulation operations should be in the IR. All types (ints, floats, arrays, vectors, etc) have IR instructions to manipulate them.

Open questions

  • Should the offset be allowed to be a runtime register value, or would limiting it to an immediate be enough to cover all practical use cases?
2 Likes

Just for comparison of these to gcc’s gimple and rtl.

Gimple:

BIT_FIELD_REF : basically the same as bitextract.

Takes only constant size and offset.

For non vectors it allows for non power of 2 sizes, the resulting type is of that size.

For vectors it requires power of 2 sizes and offset is always a multiple of that.

Can be used with MEM but is top-level only. With ssa names it is treated like any other expression, except it is GIMPLE_RHS_SINGLE(but this does not matter here).

BIT_INSERT_EXPR: like BIT insert.

Same restrictions as BFR.

It only 3 fields though, 2 inputs and an offset. The second input specifies the size implicitly.

RTL:

This is more complex.

For vector modes these are a few and I am not going to list them.

Non-vectors:

Zero_extract, has three fields, input(output too for on lhs), offset and size (i can’t remember the order).

ZERO_EXTRACT on the lhs of a set: like BIT insert.

SIGN_extract and zero_extract on rhs: like bit extraction except sign or zero extend to the mode specified as the output mode.

Subreg can be used also for bit extraction and insertion too. But those are for mode sizes only.

1 Like

Regarding the offset, should bit 0 be the MSB or the LSB?
For example, should bitextract i8, b32 0x12345678, i32 0 return 0x12 or 0x78?

IMO for consistency with other parts of LLVM such as bitcasting <32 x i1> to i32, the bit offset should match the order of bytes in memory so on little endian the bit offset starts at the LSB (since the LSB is in the first byte in memory), and on big endian the bit offset starts at the MSB (since the MSB is in the first byte in memory).

If we don’t like that and want the same order everywhere, I think counting up from 0 at the LSB is the best option.

I forgot to mention the bit offsets are dependent on if BITS_BIG_ENDIAN is set of not.

bitcast is dependent on endianness because it’s defined in terms of storing the operand in memory and then reloading it with a new type. The internal load and store are affected by endianness.

I would not expect bitinsert/bitextract to have endianness-dependent semantics – endianness should be handled when the byte type is loaded or stored, but operations on the byte type itself should be endianness-independent (i.e. 0 should always be the LSB).

I can see a few cases where endianness dependent semantics would, in theory, make some transforms simpler (basically: load/store widening from parts), but in practice it would just cause more problems even then, because it would make it hard to share the implementation with the equivalent integer handling (where the shift offsets are endianness-independent).

2 Likes

The implementation is available at [IR] Implementation and lowering of bitinsert and bitextract by pedroMVicente · Pull Request #200605 · llvm/llvm-project · GitHub

There were some divergent opinions on the PR so we are starting a discussion here.
Should these instructions support vectors and aggregates?

1 Like

I don’t think there’s really a disagreement here: vectors are single-value types (cf. Type::isSingleValueType()) and I think supporting them (when extracting from a scalar bN) is good. Aggregates (structs/arrays) are not single-value types and I don’t think they should be supported. (I personally don’t think aggregates should be first-class values at all, but that idea didn’t gain consensus.. but at least, IMO their use shouldn’t spread.) Not sure how to define e.g. bitextract ???, <4 x b32>

TBH I’d expect the instructions to only accept byte values. If we want to insert an integer/pointer into the byte value, use bitcast first.

BTW, these instructions should be trivially vectorizable. That is, we need something like %result = bitextract <2 x b8>, <2 x b32> %src, i32 24. If we decide to allow a runtime offset, the third operand is a vector.

Hm, why? I think insert/extract of iN or ptr is going to be the most common case, so avoiding a bitcast for that would be nice.

I agree that these instructions should be vectorizable. And as a result of that, I don’t think it should be possible to directly extract/insert a vector from a scalar byte type, this would cause unnecessary ambiguity with the element-wise extract/insert case.

Oh, you are right. I was thinking we can forbid directly extracting a vector from a scalar byte value in this way…