Guidance around bitcast on vectors, sub-byte types and data layout


I am wondering if it is reasonable to view the bitcast operation as a repacking of bits in a vector.

I am confused at the fact that memory storage and data layout endianness considerations seem to leak into bitcast vector<x> to vector<y> as per the langref (

The ‘bitcast’ instruction converts value to type ty2. It is always a no-op cast 
because no bits change with this conversion. The conversion is done as if
 the value had been stored to memory and read back as type ty2. 


There is a caveat for bitcasts involving vector types in relation to endianess. 
For example bitcast <2 x i8> <value> to i16 puts element zero of the vector 
in the least significant bits of the i16 for little-endian while element zero ends 
up in the most significant bits for big-endian.

In particular, I am interested in understanding what could go wrong with a notional:

%Z = bitcast <40 x i15> %V to <100 x i6>; 

I understand target-specific considerations apply and that storing to memory with involve alignment and insertion of padding. But I am unclear whether there are implications on SSA use-def chains in the absence of memory operations.

In my particular use case I am looking for an abstraction to represent bit-level repacking in MLIR. The abstraction we have at the moment is arith.bitcast which lowers to LLVM’s bitcast.
I am adding rewrites using shuffle / and / or / shift in MLIR to get significantly better ISel behavior on such repackings but am not considering data layout.

I am wondering whether we should just have a first class vector.repack and avoid bitcast if my intended use case is too much of a potential footgun?