[BPF] Change the JMP instruction format in LLVM?

Hi folks!

I need some help on how to change bpf's JMP instruction format in
LLVM's bpf target.

Currently, the BPF branching instructions use signed 16-bit integers
as the jump offsets which is far from enough for large bpf programs
(the eBPF VM in the Linux kernel already supports up to 1M
instructions anyway).

And we're trying to extend the JMP instruction to utilize the
currently unused imm32 operand.
The first attempt was the following patch but it never worked as expected:

    a.patch · GitHub

I use the following minimal C program to test it:

    test.c · GitHub

And I used the following commands to compile and disassemble it:

    clang -g -fno-builtin -O0 -target bpf -o test.o -c test.c
    llvm-objdump -S --arch-name=bpf test.o > test.S

Before the patch, the disassembly for 2 branching instructions in the
output file test.S looks like this:

       4: 7d 21 02 00 00 00 00 00 if r1 s>= r2 goto +2 <LBB0_2>

       6: 05 00 09 00 00 00 00 00 goto +9 <LBB0_3>

We can see that the jmp offset in instruction #4 is 02 00, which is
16-bit in little endian. And the jmp offset in instruction #6 is 09
00. The expected instruction bytes should be instead

       4: 7d 21 00 00 02 00 00 00

       6: 05 00 00 00 09 00 00 00

That is, we utilize the last 4 bytes, the 32-bit imm number to store
the jmp offsets.

But after applying my patch above, it looks like this:

       4: 7d 21 02 00 00 00 00 00 if r1 s>= r2 goto +0 <foo+0x28>

       6: 05 00 09 00 00 00 00 00 goto +0 <LBB0_2>

Not only the +0 offset shown in the disassembly for instruction #6 is
wrong (should be +9), but also the 32-bit imm numbers in both
instructions are still zero. So I must miss something in my patch.

If I use constant numbers in my patch, then they will appear in the
disassembly, that is, something like

    let Inst{47-32} = 3;
    let Inst{31-0} = 7;

Why won't the use of variables like BrDst in the patch?

Any hints or guidance will be greatly appreciated!

Also, after making the JMP instruction work, I'd also like to enforce
LLVM to avoid using the conditional branching instructions for large
offsets. Any hints and suggestions on how to make this work will also
be appreciated.

Thanks in advance!

Best,
Yichun