1. As I mentioned, it simply fixes a bug in implementation of one of
the two PRE's LLVM has. It does not change the PRE algorithm or add
anything to it. The code had a bug. I fixed the bug :P. PRE is
*not even adding more code in this case*. The code is already there.
All it is doing is inserting a phi node. If you transformed your
code to use memory, and reverted my patch, you'd get the same result,
because Load PRE will do the same thing. It's what PRE does.
2. GCC and other compilers have PRE's literally the same thing my
patch does (you are welcome to verify, i wrote GCC's :P), and
apparently are smart enough to handle this in RA. So i'm going to
suggest that it is, in fact, possible to do so, and i'm going to
further suggest that if we want to match their performance, we need to
be able to do the same. You can't simply "turn down" any optimization
that RA may have to deal with. It usually doesn't work in practice.
This is one of the reasons good RA is so hard.
3. As I also mentioned, register pressure heuristics in PRE simply do
not work. They've been tried. By many. With little to no success.
PRE is too high in the stack of optimizations to estimate register
pressure in any sane fashion. It's pretty much a fools errand. You
can never tune it to do what you want. *Many* have tried.
Your base level problem here is that all modern PRE algorithms (except
for min-cut PRE, as I mentioned), are based on a notion of lifetime
optimality. That is, they extend lifetimes as minimally as possible to
still eliminate a given redundancy. Ours does the same.
However, this is not an entirely useful metric. Optimizing for some
other metric is what something like min-cut PRE lets you do.
But even then, register pressure heuristics are almost certainly not
the answer.
4. This was actually already discussed when the patch was submitted,
and the consensus was "we should just fix RA". Feel free to look at
the discussion 5 months ago.
I would suggest, if you want to fix this, you take the approach that
was discussed then.