Hi there,
I am working on a RISCV disassembling project. I need to get the function parameter and return type by decoding elf file. The idea is trivial, I will walk through the instructions in a function and find registers which are used but not defined before. Then I will treat them as function parameters, some edge cases are handled of course.
I use llvm-mc to disassemble the bytecode and get instructions, but I have no idea how to determine the operand size once I get instructions. RISCV seems to use the whole register to store the data. Consider following example in C
int foo(int a, long long b){
if(b > 0L)
return a;
return 0;
}
The assembly code of foo
on x86 would be
0000000000001129 <foo>:
1129: f3 0f 1e fa endbr64
112d: 55 pushq %rbp
112e: 48 89 e5 movq %rsp, %rbp
1131: 89 7d fc movl %edi, -4(%rbp)
1134: 48 89 75 f0 movq %rsi, -16(%rbp)
1138: 48 83 7d f0 00 cmpq $0, -16(%rbp)
113d: 7e 05 jle 0x1144 <foo+0x1b>
113f: 8b 45 fc movl -4(%rbp), %eax
1142: eb 05 jmp 0x1149 <foo+0x20>
1144: b8 00 00 00 00 movl $0, %eax
1149: 5d popq %rbp
114a: c3 retq
Register %edi
and %rsi
indicate parameters are 32bit and 64 bit respectively.
However, corresponding RISCV assembly looks like
0000000080002000 <foo>:
80002000: 13 01 01 fe addi sp, sp, -32
80002004: 23 3c 11 00 sd ra, 24(sp)
80002008: 23 38 81 00 sd s0, 16(sp)
8000200c: 13 04 01 02 addi s0, sp, 32
80002010: 23 24 a4 fe sw a0, -24(s0)
80002014: 23 30 b4 fe sd a1, -32(s0)
80002018: 83 35 04 fe ld a1, -32(s0)
8000201c: 13 05 00 00 li a0, 0
80002020: 63 5a b5 00 bge a0, a1, 0x80002034 <foo+0x34>
80002024: 6f 00 40 00 j 0x80002028 <foo+0x28>
80002028: 03 25 84 fe lw a0, -24(s0)
8000202c: 23 26 a4 fe sw a0, -20(s0)
80002030: 6f 00 00 01 j 0x80002040 <foo+0x40>
80002034: 13 05 00 00 li a0, 0
80002038: 23 26 a4 fe sw a0, -20(s0)
8000203c: 6f 00 40 00 j 0x80002040 <foo+0x40>
80002040: 03 25 c4 fe lw a0, -20(s0)
80002044: 03 34 01 01 ld s0, 16(sp)
80002048: 83 30 81 01 ld ra, 24(sp)
8000204c: 13 01 01 02 addi sp, sp, 32
80002050: 67 80 00 00 ret
I cannot tell the size of a0
and a1
by just checking register. It seems the only way to get operand size in RISCV is to check the instruction (sw, sd, ld and li). But this approach needs to handle EVERY instructions seperately.
I know that register size information is stored in TargetRegisterInfo class, but the point is RISCV do not divide a register to several sub registers(like RAX, EAX, AX, AH, AL), at least for x0-x31.
So my question is, how can I get operand size in RISCV(remember, I am disassembling from bytecode to LLVM IR)?