Following our previous discussion, I conducted further experiments using the
-fzero-call-used-regs=all parameter in
clang-17.0.0 and delved deeper into the ROP mitigation mechanisms implemented during the compilation phase of these programs. I aimed to identify shortcomings in these mitigation mechanisms and attempt to improve them. Below, I would like to continue our discussion on these mitigation mechanisms.
Through this paper, we can understand that the
-fzero-call-used-regs=all parameter clears the values of registers before each function returns. By observing the binary programs compiled with this parameter, we noticed that almost every
pop instruction after each function turned into
pxor instructions. Additionally, when comparing the Gadget sets extracted by ropper from the program before and after adding this parameter, we found that it significantly reduced
pop xxx; ret; style Gadgets and, on average, reduced the number of Gadgets in the program by 60%. From an intuitive perspective, clearing the register values before a function returns is a simple and practical operation. It reduces the number of Gadgets in the program and prevents the leakage of register values upon program return. These protective measures increase the difficulty of constructing ROP for attackers.
Such low-cost ROP mitigation mechanisms (compared to CFI) can be deployed in programs on devices like IoT devices or network equipment. Many of these devices prioritize rapid response, and their security measures are often weakened. Therefore, they greatly benefit from adding such mitigation measures at the compilation phase to enhance their security.
To observe the performance of this mitigation mechanism in more programs, I expanded the scope beyond just the isc-dhcp and proftpd open-source projects. I collected dozens of mainstream open-source programs to construct a test set. I used the
-fzero-call-used-regs=all compilation parameter to generate 50 binary programs, including service programs, software, language interpreters, and critical lib libraries from the Linux operating system. We used ropper and ROPgadget tools to extract the Gadget sets for each program and attempted to evaluate the ROP construction capabilities of each Gadget set. For this purpose, we set an ROP construction target: executing
execve("/bin/sh", 0, 0). To achieve this ROP, we first assessed the Gadget set’s ability for arbitrary address writes (i.e., setting memory values to
"/bin/sh"). Secondly, we evaluated the capability to set the four key registers rdi, rsi, rdx, and rax (then setting
rdi, rsi, rdx as parameters and
rax as the system call number). Out of the 50 programs, we successfully generated the target ROP payload for 45 programs, achieving a success rate of 90%. As a side note, 48 programs could construct an arbitrary memory-write ROP Goal payload. We have placed the test dataset we constructed on GitHub.
We manually analyzed the reasons for each program’s failure to generate ROP payloads. For most programs, the failures were due to the inability to control certain registers. This limitation arose because the number of Gadgets related to those registers was very limited, and the available Gadgets were relatively complex. The range of controllable register values was restricted. It’s worth noting that, naturally, if the attack requirements are lowered, such as reducing the need to set a particular register value, more programs might succeed. Additionally, we analyzed the reasons for the successful generation of ROP payloads in some programs, which can be summarized as follows:
- The ROP mitigation mechanism significantly reduced the availability of Gadgets like “
pop xxx; ret;” However, because
x86_64is not a fixed-size architecture, many non-aligned Gadgets can still be used. The
\xc3bytecode is crucial for ret instructions.
- Some Gadgets that use mov and arithmetic instructions for data transfer only require simple calculations to set most target values. Furthermore, the setting of registers leads to a chain reaction. When we have the ability to set one register, it implies that we can set more registers based on that capability, and even memory values.
- Gadgets that involve memory read and write operations are also valuable. Although their usage requirements are higher, requiring control of more registers, once you can achieve arbitrary address reads or writes, it can create a chain reaction. Using a section of memory as an intermediary can assist in setting more register or memory values.
- Conditional branch Gadgets can also be utilized.
- Additionally, some special bytecodes have specific exploitation techniques, such as
ret n, etc. Their presence can increase the effectiveness of Gadgets by 30%.
We manually analyzed the reasons for failing to generate ROP in each program. Most programs failed to generate ROP due to the inability to control specific register values. This was mainly because the number of Gadgets related to them was very limited, and the available Gadgets were quite complex, with limitations on the range of controllable register values. It’s worth mentioning that, naturally, if the attack target requirements are lowered, such as setting fewer register values, perhaps more programs could succeed. Additionally, we analyzed the reasons for successful ROP generation in some programs, which can be summarized as follows:
- Reducing the number of Gadgets ending with ‘ret’. I have paid attention to the ROP mitigation measures proposed by OpenBSD during the compilation phase. They found that the usage of
rbxis closely related to the
\xc3bytecode (assembly instruction for
ret). Therefore, they adjusted the priority of `rbx register usage, significantly reducing Gadgets ending with ‘ret’ in programs, and more importantly, they considered non-aligned Gadgets.
- Increasing the number of data dependencies and side-effect fixes required for individual Gadgets. Paying attention to some instructions before each jump instruction’s bytecode, adding “redundant instructions” (or other methods) to make the Gadgets longer. This increases the number of data dependencies and side-effect fixes that must be satisfied for individual Gadgets. For example, consider the Gadget
mov rdx, rax; mov qword ptr [rcx], rdx; test rax, rax; jne 0xdeadbeef; call [rbx + 0x30];. If we use it to perform the data transfer from ‘
rax’ to ‘
rdx’ (i.e., the ‘
mov rdx, rax;’ instruction), then the other instructions in the Gadget (i.e., ‘
mov qword ptr [rcx], rdx; test rax, rax; jne 0xdeadbeef;’) become side effects, and their jump addresses are controlled by memory, creating data dependencies (the ‘
call [rbx + 0x30];’ instruction). These side effects and data dependencies need to be controlled within specific ranges to ensure that no crashes occur during Gadget execution.
- Increasing the proportion of conditional branch Gadgets. Analyze whether the bytecode of conditional branches is related to certain registers or instructions. Adjust the compilation scheme without affecting performance to increase the proportion of conditional branch Gadgets. Using a Gadget with conditional branches is relatively difficult as it requires considering at least three objects: setting the operation object value, satisfying the conditional branch, and controlling the direction of the jump branch in the corresponding branch. If the depth of conditional branches can also be increased, such Gadgets become even more challenging to use. Similarly, increasing the proportion of arithmetic operation instructions in Gadgets has significance in resisting ROP attacks.
- Reducing the occurrence of critical bytecodes for certain Gadget exploitation techniques (