disabling combining load/stores in optimizer.

Consider writes to a struct {i32 a; i32 b};

The optimizer can currently combine stores (i32, i32) to a single i64 store operation. Is there a way to disable that?

I feel that such optimizations may not result in any gain for PIC16 as PIC16 does everything on i8.

  • Sanjiv

Consider writes to a struct {i32 a; i32 b};

The optimizer can currently combine stores (i32, i32) to a single i64 store operation. Is there a way to disable that?

Not currently. There are some ideas floating around about
including in TargetData a list of integer types that the
target natively supports, which would allow instcombine
and other passes to make more informed decisions, but
at this point it's just ideas.

I feel that such optimizations may not result in any gain for PIC16 as PIC16 does everything on i8.

The legalize pass should turn an i64 store into 8 i8 stores
then, which is essentially the same as what an {i32,i32} store
would turn into. Is there a problem with this?

Dan

> The optimizer can currently combine stores (i32, i32) to a single
> i64 store operation. Is there a way to disable that?

Not currently. There are some ideas floating around about
including in TargetData a list of integer types that the
target natively supports, which would allow instcombine
and other passes to make more informed decisions, but
at this point it's just ideas.

There are other cases that we can benefit from such ideas. Could you
please give pointer to these discussions?

> I feel that such optimizations may not result in any gain for PIC16
> as PIC16 does everything on i8.

The legalize pass should turn an i64 store into 8 i8 stores
then, which is essentially the same as what an {i32,i32} store
would turn into. Is there a problem with this?

We are currently doing this, however I think disabling such
optimizations is a much better solution.

See "Adding legal integer sizes to TargetData" on Feb 1, 2009 on llvmdev.

-Chris

An LLVM design goal is that backends should be able to outsmart
instcombine when necessary, rather than having instcombine be able
to disable parts of itself in order to avoid foiling the backends.
Practicality sometimes steers elsewhere of course. Please explain
why you think suppressing this particular optimization is better;
it isn't obvious how it would look different in the end.

Dan

Perhaps the transformation in question is actually memcpy->scalar
load+store? For a target where the scalar takes more than a couple
registers, if the backend can't disambiguate the pointers, it's
essentially forced to copy src->stack temporary->dest. For an 64-bit
memcpy on a target with 8-bit registers, I imagine the result is quite
ugly.

-Eli

Yeah, I agree. LegalizeTypes should be able to trivially lower this.

-Chris

"Should be" is the key clause here. I agree with you and Dan in general.
However, we have at least one testcase where the overhead of legalize is
so huge it takes *hours* to finish. This is with LLVM 2.4 and I'm waiting
on an LLVM 2.5 upgrade to re-test so I can file a bug if necessary.

The point is that while legalize should be able to handle stuff, sometimes
it can't for other reasons.

                                  -Dave

That doesn't seem relevant; we shouldn't mess with transformation
passes to hide performance issues in CodeGen.

-Eli

What Eli said. :slight_smile:

It sounds like a bug in legalize if so, a bug we should fix! That said, if there are well understood reasons why the code generator is not doing something, and if it isn't likely to change in the near future, we can consider making the optimizer aware of legal integer types.

-Chris

> We are currently doing this, however I think disabling such
> optimizations is a much better solution.

An LLVM design goal is that backends should be able to outsmart
instcombine when necessary, rather than having instcombine be able
to disable parts of itself in order to avoid foiling the backends.
Practicality sometimes steers elsewhere of course. Please explain
why you think suppressing this particular optimization is better;
it isn't obvious how it would look different in the end.

Well, for one thing, our port has no native operation other than 8-bit
so it does not make sense to promote operations to higher precisions.
Eventually all those operations will be lowered and the resulting code
is most likely worse than if it was not promoted in the first place. So
I think it should be at the discretion of port to enable or disable such
optimizations as needed.

A

While that it a valid approach in general, it is completely at odds with the approach that the LLVM codebase has taken. The general LLVM philosophy is that all optimizations should be as aggressive as possible at whatever they do, and it is then the responsibility of the target to lower what the optimizers produce into something legal for that machine.

Contravention of that design philosophy in a major portion of the optimization suite is unlikely at best, to be honest.

--Owen

> So I think it should be at the discretion of port to enable or

disable

> such optimizations as needed.

While that it a valid approach in general, it is completely at odds
with the approach that the LLVM codebase has taken. The general LLVM
philosophy is that all optimizations should be as aggressive as
possible at whatever they do, and it is then the responsibility of the
target to lower what the optimizers produce into something legal for
that machine.

Contravention of that design philosophy in a major portion of the
optimization suite is unlikely at best, to be honest.

I can see the benefits of this approach at a macro-level, however at
micro-level the impact is visible especially in embedded targets with
limited memory and register resources (such as pic16)
We see the impact not only wrto code quality but also compile time (to
recover the damage that the optimizer has done). The other problem is
that our port never seem to be stable because as such generic
optimizations get added, something new will break in our port and as
more and more higher end targets are being ported to llvm I expect more
and more optimizations be added that are at odds with our port.
That is why we are asking for a way to at least have some control on
such optimizations.

A.

Is this actually causing a performance problem in practice? If so, please show the generated and desired code for a tiny testcase. Instead of talking theory, please give an example.

-Chris

Obviously I don't have a PIC16 example, but we've seen many cases in our
own optimizer where we've had to throttle things, especially for register
pressure. Sometimes the backend really can't recover because there isn't
enough information to do so.

Generally I abhor throttles. But sometimes they are necessary because the
earlier passes have the right information to make the decisions.

                             -Dave

Well, this part is on you. AFAIK, the only PIC16 tests in the LLVM testsuite
were generated by others. It's up to you to put tests in the testbase that
people can run. We can't anticipate the affects of choices on your port if
your port isn't part of the regular testing regime.

                               -Dave

Is this actually causing a performance problem in practice? If so,
please show the generated and desired code for a tiny testcase.

Not anymore; the performance problem has been addressed. Though, there
are other examples of this sort which do not really relate to the
subject of this thread so I would like to discuss them at a different
time.

Thanks for the link to the other thread:
"Adding legal integer sizes to TargetData"

Cheers,
Ali.