[iovisor-dev] [PATCH RFC 3/4] New 32-bit register set

Hi, Jiong,

Thanks for the patch! It is a great start to support 32bit register in BPF.
In the past, I have studied a little bit to see whether 32bit register
support may reduce
the number of unnecessary shifts on x86_64 and improve the
performance. Looking through
a few bpf programs and it looks like the opportunity is not great, but
still nice to have if we
have this capability. As you mentioned, this definitely helped 32bit
architecture!

I am not an expert in LLVM tablegen. I briefly looked through the code
change and it looks like
correct. Hopefully some llvm-dev tablegen experts can comment on the
implementation.

Below I only have a couple of minor comments.

This patch introduce the new "w*" 32-bit register set and alias them to the low
32-bit of the corresponding 64-bit register.

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
lib/Target/BPF/BPFRegisterInfo.td | 74 ++++++++++++++++++++++++++++-----------
1 file changed, 53 insertions(+), 21 deletions(-)

diff --git a/lib/Target/BPF/BPFRegisterInfo.td b/lib/Target/BPF/BPFRegisterInfo.td
index c8e24f8..a5722fe 100644
--- a/lib/Target/BPF/BPFRegisterInfo.td
+++ b/lib/Target/BPF/BPFRegisterInfo.td
@@ -11,31 +11,63 @@
// Declarations that describe the BPF register file
//===----------------------------------------------------------------------===//

+let Namespace = "BPF" in {
+ def sub_32 : SubRegIndex<32>;
+}
+
+class Wi<bits<16> Enc, string n> : Register<n> {
+ let HWEncoding = Enc;
+ let Namespace = "BPF";
+}
+
// Registers are identified with 4-bit ID numbers.
// Ri - 64-bit integer registers
-class Ri<bits<16> Enc, string n> : Register<n> {
- let Namespace = "BPF";
+class Ri<bits<16> Enc, string n, list<Register> subregs>
+ : RegisterWithSubRegs<n, subregs> {
   let HWEncoding = Enc;
+ let Namespace = "BPF";
+ let SubRegIndices = [sub_32];
}

-// Integer registers
-def R0 : Ri< 0, "r0">, DwarfRegNum<[0]>;
-def R1 : Ri< 1, "r1">, DwarfRegNum<[1]>;
-def R2 : Ri< 2, "r2">, DwarfRegNum<[2]>;
-def R3 : Ri< 3, "r3">, DwarfRegNum<[3]>;
-def R4 : Ri< 4, "r4">, DwarfRegNum<[4]>;
-def R5 : Ri< 5, "r5">, DwarfRegNum<[5]>;
-def R6 : Ri< 6, "r6">, DwarfRegNum<[6]>;
-def R7 : Ri< 7, "r7">, DwarfRegNum<[7]>;
-def R8 : Ri< 8, "r8">, DwarfRegNum<[8]>;
-def R9 : Ri< 9, "r9">, DwarfRegNum<[9]>;
-def R10 : Ri<10, "r10">, DwarfRegNum<[10]>;
-def R11 : Ri<11, "r11">, DwarfRegNum<[11]>;
+// 32-bit Integer registers which alias to low part of 64-bit one.
+def W0 : Wi<0, "w0">, DwarfRegNum<[0]>;
+def W1 : Wi<1, "w1">, DwarfRegNum<[1]>;
+def W2 : Wi<2, "w2">, DwarfRegNum<[2]>;
+def W3 : Wi<3, "w3">, DwarfRegNum<[3]>;
+def W4 : Wi<4, "w4">, DwarfRegNum<[4]>;
+def W5 : Wi<5, "w5">, DwarfRegNum<[5]>;
+def W6 : Wi<6, "w6">, DwarfRegNum<[6]>;
+def W7 : Wi<7, "w7">, DwarfRegNum<[7]>;
+def W8 : Wi<8, "w8">, DwarfRegNum<[8]>;
+def W9 : Wi<9, "w9">, DwarfRegNum<[9]>;
+def W10 : Wi<10, "w10">, DwarfRegNum<[10]>;
+def W11 : Wi<11, "w11">, DwarfRegNum<[11]>;
+
+// 64-bit Integer registers
+def R0 : Ri<0, "r0", [W0]>, DwarfRegNum<[0]>;
+def R1 : Ri<1, "r1", [W1]>, DwarfRegNum<[1]>;
+def R2 : Ri<2, "r2", [W2]>, DwarfRegNum<[2]>;
+def R3 : Ri<3, "r3", [W3]>, DwarfRegNum<[3]>;
+def R4 : Ri<4, "r4", [W4]>, DwarfRegNum<[4]>;
+def R5 : Ri<5, "r5", [W5]>, DwarfRegNum<[5]>;
+def R6 : Ri<6, "r6", [W6]>, DwarfRegNum<[6]>;
+def R7 : Ri<7, "r7", [W7]>, DwarfRegNum<[7]>;
+def R8 : Ri<8, "r8", [W8]>, DwarfRegNum<[8]>;
+def R9 : Ri<9, "r9", [W9]>, DwarfRegNum<[9]>;
+def R10 : Ri<10, "r10", [W10]>, DwarfRegNum<[10]>;
+def R11 : Ri<11, "r11", [W11]>, DwarfRegNum<[11]>;

Since you are touching this part, you could use "sequence" loop
to generate W* and R* definitions.

// Register classes.
-def GPR : RegisterClass<"BPF", [i64], 64, (add R1, R2, R3, R4, R5,
- R6, R7, R8, R9, // callee saved
- R0, // return value
- R11, // stack ptr
- R10 // frame ptr
- )>;
+def GPR32 : RegisterClass<"BPF", [i32], 32, (add
+ (sequence "W%u", 1, 9), // Callee saved

You can remove "Called saved" comments here.
The callee saved registers are actually R6-R9.

+ W0, // Return value
+ W11, // Stack Ptr
+ W10 // Frame Ptr
+)>;
+
+def GPR : RegisterClass<"BPF", [i64], 64, (add
+ (sequence "R%u", 1, 9), // Callee saved

ditto.

Hi, Jiong,

The new patch looks good. I did some basic testing on
net-next:samples/bpf and net-next:tools/testing/selftests/bpf and it
works fine. All existing llvm unit tests are not impacted as well as
expected.

I have applied the patch to the trunk. Besides your other work to
support 32bit abi, it would be
interesting to see how new 32bit register can be used in 64bit
architecture which may help improve performance and/or reduce
instruction count.

Thanks,
Yonghong