I’m currently dealing with an optimization issue involving the fdiv and fsqrt instructions in the context of AArch64. The core of the problem lies in the separation of these instructions into different basic blocks due to a conditional statement at the end of the function, which would require me to pattern match across basic blocks.
From this GitHub issue, the example illustrates how the division (fdiv) is placed into a different block from the square root operation (fsqrt), making it challenging to match and optimize them together.
Here’s the relevant code:
double res, res2, tmp;
void foo (double a, double b, int c, int d) {
tmp = 1.0 / __builtin_sqrt (a); // fdiv & fsqrt
res = tmp * tmp;
if (d)
res2 = a * tmp; // fdiv
}
Has anyone encountered a similar issue or have suggestions on how to handle pattern matching across basic blocks? Any insights or guidance would be greatly appreciated!
I am an Outreachy intern, and my previous work has involved the SelectionDAG, where I wrote a custom lowering for another optimization. I am not tied to a specific area, as my internship description is “Improve AArch64 performance,” and this issue is on the list of potential tasks I could tackle.
From my research, it appears that the SelectionDAG may not be suitable for addressing the current optimization issue, so I am currently exploring GlobalISel.
At this point I am not yet familiar with the codebase, so I am navigating through the documentation to figure out what I am looking for.
SelectionDag has limitations for cross basic block optimizations. GlobalIsel works at function-scope. If you are not tied to AArch64 assembler, you could also try LLVM-IR. It has more tools and analyses.