Poor register allocations vs gcc

Hello,

I have an issue with the llvm optimizations. I need to create object codes.

the -ON PURPOSE poor && useless- code :

As far as I know clang on OS X always sets up a frame pointer unless you explicitely use -fomit-frame-pointer. I think the reasoning being that dtrace and others rely on frame pointers being present.

I don't see why using %ecx would be a problem, there are no extra spill/reloads produced because of that.

- Matthias

Hello,
Ecx is a problem because you have to xor it. Which is avoided in the gcc compilation. Fomit-pointer-frame helps.

Now llvm is one instruction from gcc. If ecx was not used, it would be as fast.

Hi Jog,

This look like a scheduling problem to me.

The main difference here is that in GCC the final “a + b” is scheduled before the call, whereas in LLVM case, this is scheduled after the call.
Because of that, %rdi cannot be used in the final add and it has to be saved somewhere else.

You can see that in effect by replacing:
  puts("ok");
  return a + b;

By

        b += a;
  puts("ok");
  return b;

That being said, you shouldn’t have to do that to have the nice code.

Could you file a PR for the scheduling problem?

Thanks,
-Quentin

Hi,
I certainly will Quentin!

Thanks

Hello,
Ecx is a problem because you have to xor it. Which is avoided in the gcc compilation. Fomit-pointer-frame helps.

Now llvm is one instruction from gcc. If ecx was not used, it would be as fast.

Register allocation is not the problem here. If you look at the gcc produced code you see “movl $0, %eax” as well (no idea why it wouldn’t use xorl to zero the register).
I looked into it again and the fact that llvms version is 1 instruction more is because the addition of 71 is folded into the last leal which means the value before adding the 71 and the value plus 71 is alive in the part before the puts call effectively leading to an additional mov instruction being necessary to duplicate the value. You could file a PR if you really care about the issue.

  • Matthias

I will Matthias.

Thanks!

By the way Quentin,

Your modification makes llvm much faster than gcc (12 ops vs 15 ops): less pushq/popq, better use of the registers..
This code is silly at best but thanks to you I could learn something on llvm.

Thanks a lot :slight_smile:

--š
Jog