What's the most efficient way to load a vector of pointers?

Hello everyone. I’ll tell you my goal then my question. In pseudo code I’d like to do the following

values = lookup_table[offsets]

offsets is <8 x i8> which I get by doing and <8 x i8> %a, %b
lookup_table is simply a 16 byte constant array of i8 aligned to 16.
values would be <8 x i8> which I’ll use with another vector operation

I was thinking I might need an array of pointers but I looked at getelementptr for a solution. It appears it can use it https://llvm.org/docs/LangRef.html#vector-of-pointers
Specifically the following line

%A = getelementptr i8, i8* %ptr, <4 x i64> %offsets

However it doesn’t show how I should load %A but mentions gather (I don’t need a mask or pass through values). I looked at ‘load’ and it doesn’t appear to support a vector. Is there anything else I can use that may be more efficient? If llvm.masked.gather is the way to go do I want the mask all 1’s or 0’s? I heard of masked-off bit being different things to different people but my first impression is I want all 0’s

For masked.gather, the mask should be all 1s to do a load for each element. You should set the passthru to undef. On targets that don’t have native support for gather in hardware, the gather will be expanded into a loop that does each load and inserts it into a vector. This is expansion is done by the ScalarizedMaskedMemIntrinsicPass that runs a little before SelectionDAG.