[PATCH RFC 0/4] Initial 32-bit eBPF encoding support

Hi,

   Currently, LLVM eBPF backend always generate code in 64-bit mode, this may
cause troubles when JITing to 32-bit targets.

   For example, it is quite common for XDP eBPF program to access some packet
fields through base + offset that the default eBPF will generate BPF_ALU64 for
the address formation, later when JITing to 32-bit hardware, BPF_ALU64 needs
to be expanded into 32 bit ALU sequences even though the address space is
32-bit that the high bits is not significant.

   While a complete 32-bit mode implemention may need an new ABI (something like
-target-abi=ilp32), this patch set first add some initial code so we could
construct 32-bit eBPF tests through hand-written assembly.

   A new 32-bit register set is introduced, its name is with "w" prefix and LLVM
assembler will encode statements like "w1 += w2" into the following 8-bit code
field:

     BPF_ADD | BPF_X | BPF_ALU

BPF_ALU will be used instead of BPF_ALU64.

   NOTE, currently you can only use "w" register with ALU statements, not with
others like branches etc as they don't have different encoding for 32-bit
target.

Great to see work in this direction! Can we also enable to use / emit
all the 32bit BPF_ALU instructions whenever possible for the currently
available bpf targets while at it (which only use BPF_ALU64 right now)?

Thanks,
Daniel

I don't think we need to gate 32bit alu generation with a flag.
Though interpreter and JITs support 32-bit since day one, the verifier
never seen such programs before, so some valid programs may get
rejected. After some time passes and we're sure that all progs
still work fine when they're optimized with 32-bit alu, we can flip
the switch in llvm and make it default.

>>>>> Hi,
>>>>>
>>>>> Currently, LLVM eBPF backend always generate code in 64-bit mode,
>>>>> this may
>>>>> cause troubles when JITing to 32-bit targets.
>>>>>
>>>>> For example, it is quite common for XDP eBPF program to access
>>>>> some packet
>>>>> fields through base + offset that the default eBPF will generate
>>>>> BPF_ALU64 for
>>>>> the address formation, later when JITing to 32-bit hardware,
>>>>> BPF_ALU64 needs
>>>>> to be expanded into 32 bit ALU sequences even though the address
>>>>> space is
>>>>> 32-bit that the high bits is not significant.
>>>>>
>>>>> While a complete 32-bit mode implemention may need an new ABI
>>>>> (something like
>>>>> -target-abi=ilp32), this patch set first add some initial code so we
>>>>> could
>>>>> construct 32-bit eBPF tests through hand-written assembly.
>>>>>
>>>>> A new 32-bit register set is introduced, its name is with "w"
>>>>> prefix and LLVM
>>>>> assembler will encode statements like "w1 += w2" into the following
>>>>> 8-bit code
>>>>> field:
>>>>>
>>>>> BPF_ADD | BPF_X | BPF_ALU
>>>>>
>>>>> BPF_ALU will be used instead of BPF_ALU64.
>>>>>
>>>>> NOTE, currently you can only use "w" register with ALU
>>>>> statements, not with
>>>>> others like branches etc as they don't have different encoding for
>>>>> 32-bit
>>>>> target.
>>>>
>>>> Great to see work in this direction! Can we also enable to use / emit
>>>> all the 32bit BPF_ALU instructions whenever possible for the currently
>>>> available bpf targets while at it (which only use BPF_ALU64 right now)?
>>>
>>> Hi Daniel,
>>>
>>> Thanks for the feedback.
>>>
>>> I think we could also enable the use of all the 32bit BPF_ALU under
>>> currently
>>> available bpf targets. As we now have 32bit register set support, we could
>>> make
>>> i32 type as legal type to prevent it be promoted into i64, then hook it up
>>> with i32
>>> ALU patterns, will look into this.
>>
>> I don't think we need to gate 32bit alu generation with a flag.
>> Though interpreter and JITs support 32-bit since day one, the verifier
>> never seen such programs before, so some valid programs may get
>> rejected. After some time passes and we're sure that all progs
>> still work fine when they're optimized with 32-bit alu, we can flip
>> the switch in llvm and make it default.
>
> Thinking about next steps - do we expect the 32b operations to clear the
> upper halves of the registers? The interpreter does it, and so does
> x86. I don't think we can load 32bit-only programs on 64bit hosts, so
> we would need some form of data flow analysis in the kernel to prune
> the zeroing for 32bit offload targets. Is that correct?

Could you contrive an example to show the problem? If I understand
correctly, you most worried that some natural sign extension is gone
with "clearing the upper 32-bit register" and such clearing may make
some operation, esp. memory operation not correct in 64-bit machine?

Hm. Perhaps it's a blunder on my side, but let's take:

  r1 = ~0ULL
  w1 = 0
  # use r1

on x86 and the interpreter, the w1 = 0 will clear upper 32bits, so r1
ends up as 0. 32b arches may translate this to something like:

  # r1 = ~0ULL
  r1.lo = ~0
  r1.hi = ~0
  # w1 = 0
  r1.lo = 0
  # r1.hi not touched

which will obviously result in r1 == 0xffffffff00000000. LLVM should
not assume r1.hi is cleared, but I'm not sure this is a strong enough
argument.

Not sure what LLVM will do in this case for later "r1" access
unless going through the real implementation. My hunch is LLVM
should do a conversion from 32bit to 64bit, "r1 <<= 32" and
"r1 >>= 32" after "w1 = 0" before using r1. Let us wait and check
once implementation in place.

llvm will assume that r1.hi is cleared. 32-bit subregisters were
defined on the day one. See Documentation/networking/filter.txt
"All eBPF registers are 64-bit with 32-bit lower
subregisters that zero-extend into 64-bit if they are being written to."
If some JIT is not clearing upper bits, it's a bug or it's being too smart :slight_smile:
We can add analysis pass to the verifier to help JITs in such case.