One piece of code I'm writing has a lot of intermediates, and I'm
trying to optimize down the number of memory accesses. Here's a
snippet from the start of the function, where I think there is some
low-hanging fruit:
You'll note that rbx,r12,r13,r14,r15,rbp are all dead after the
pushes. But the spill code still insists on using rax to load the
spilled values, forcing them to be reloaded later. Is the register
allocator (pbqp, I think) capable of having values in registers and on
the stack at the same time?
One piece of code I'm writing has a lot of intermediates, and I'm
trying to optimize down the number of memory accesses. Here's a
snippet from the start of the function, where I think there is some
low-hanging fruit:
You'll note that rbx,r12,r13,r14,r15,rbp are all dead after the
pushes. But the spill code still insists on using rax to load the
spilled values, forcing them to be reloaded later.
I'm not the most familiar with this sort of thing - but a small
example (of llvm bitcode) & the optimization flags you used, etc,
might be helpful (& I might be able to have a go at explaining it, if
no one else does).
Is the register
allocator (pbqp, I think) capable of having values in registers and on
the stack at the same time?
I can at least confirm that the default register allocator on x86
isn't PBQP, it's the greedy allocator (unless your'e compiling with
-O0, in which case it's the fast allocator).
I'm not the most familiar with this sort of thing - but a small
example (of llvm bitcode) & the optimization flags you used, etc,
might be helpful (& I might be able to have a go at explaining it, if
no one else does).
The actual C code is shorter. (You'll have to unwrap the lines.) Bonus
points if you can identify the algorithm. }:>
#include <stdint.h>
typedef unsigned int uint128_t __attribute__((mode(TI)));
You'll note that rbx,r12,r13,r14,r15,rbp are all dead after the
pushes. But the spill code still insists on using rax to load the
spilled values, forcing them to be reloaded later. Is the register
allocator (pbqp, I think) capable of having values in registers and on
the stack at the same time?
There are a couple of problems here:
(1) PBQP doesn't split live intervals, so those array elements get
spilled everywhere, even in places where there are free registers.
(2) The PBQP allocator always introduces a stack slot for spilled
values, even when they're already on the stack.
I definitely want to tackle these issues, but they'll have to wait for
a month or so while I write up my thesis.