Driver: Default CPUs

Hi,

I'm kicking off this discussion as it came out of my previous thread on ARM
Driver cleanup that people had different opinions on the use of default
CPUs.

The driver, currently, *always* sets the "-target-cpu" option to cc1. The
way it finds a default CPU for an architecture is hardcoded and nasty, and
I'd love to get rid of it.

My personal opinion is that there should be no need to set a default CPU all
the time. The target triple should suffice, and if the user wants
performance tuned to a specific CPU they should mention that on the
commandline with -mcpu= or -mtune=. If no CPU is specified, I feel that a
"blended" scheduling should be done such that the resultant code runs well
on all chips with the given architecture/triple.

The fact that LLVM doesn't have many processor itineraries (especially for
ARM) and can't currently do this shouldn't really factor in to this
discussion - I'd like this discussion to be more what we *should* do, not
what we *can* do with the current constraints. We can then take the outcomes
of it and see concretely what we may need to change.

Also, I'm not considering backwards compatibility for Darwin here - any
solution will take "Darwin as a special case" into account.

The arguments I've heard so far are:
  * Mine, that there should be no default CPU selected if the user doesn't
select it himself. I feel it adds a hidden option that the user is unaware
of, and depending on what that default is may cause inferior performance on
the CPU the user actually runs it on.
  * Jim's (? I forget who responded ?), that there *should* be a default CPU
all the time, but perhaps some pseudo-CPUs could be added that exhibit a
blended scheduling. For example for ARM, a "v7" pseudo-cpu could be added
that has an itinerary that performs well on all v7 cores. By the way, I
think ARM is the only high-visibility target with such a difference between
cores that this matters. I assume MIPS and PowerPC are similar though?
  * Keep the current behaviour and select a default CPU for the given
architecture.
  * ???

Out of all of these, the only one I really dislike is keeping the current
behaviour.

What do people think? Do people care?

Cheers,

James

The driver, currently, *always* sets the "-target-cpu" option to cc1. The
way it finds a default CPU for an architecture is hardcoded and nasty, and
I'd love to get rid of it.

Part of the problem here is that some of the RPM based Linux porting folks are 'coining' their own tuple definitions to fit 'political' branding decisions as to hardware FP and such. I don't see HOW one avoids 'nasty' when one has to interface to that layer of the stack

My personal opinion is that there should be no need to set a default CPU all
the time. The target triple should suffice,

But for the reasons above, this shifts the load of chasing political decisions to the wrong place ... it is all fine, and well and good for people to make what are essentially 'branding' decisions in their tuple, but it should not transfer the load for supporting such (sorry, but my feeling:) insanity into code

and if the user wants

performance tuned to a specific CPU they should mention that on the
commandline with -mcpu= or -mtune=. If no CPU is specified, I feel that a
"blended" scheduling should be done such that the resultant code runs well
on all chips with the given architecture/triple.

But 'blended' just disguises the issue of whose ox is being gored by abstracting it behind another level of hiding ... The argument could be made that 'blended' should be that capable of running on the largest number of platforms (and thus at the expense of code bload carrying around run time maths code, not needed when HW FP _is_ present

What do people think? Do people care?

It is a discussion worth having, I think

- Russ herrold

Hi Russ,

Part of the problem here is that some of the RPM based Linux
porting folks are 'coining' their own tuple definitions to fit
'political' branding decisions as to hardware FP and such. I
don't see HOW one avoids 'nasty' when one has to interface to
that layer of the stack

I'm not sure I fully understand the problem here. Could you provide a concrete set of examples?

But for the reasons above, this shifts the load of chasing
political decisions to the wrong place ... it is all fine, and
well and good for people to make what are essentially
'branding' decisions in their tuple, but it should not
transfer the load for supporting such (sorry, but my feeling:)
insanity into code

Parsing toolchain and OS-specifics out of the host environment and triple should be the job of the driver, I agree (although remember that LLVM itself has a Triple type that deals with some of this stuff).

But again, I'm not certain where this relates to the default CPUs. Perhaps an example will make it clearer?

But 'blended' just disguises the issue of whose ox is
being gored by abstracting it behind another level of hiding
... The argument could be made that 'blended' should be that
capable of running on the largest number of platforms (and
thus at the expense of code bload carrying around run time
maths code, not needed when HW FP _is_ present

I'm not aware of an architecture where default hardfp vs. Softfp is different between CPUs of the same subarchitecture. Therefore, specifying the subarchitecture (in ARM's case, 'v7' or 'v7m') should have enough information to set up a suitable default calling convention.

Also, the calling convention is often altered by the target triple. It's not related intrinsically to the target CPU, IMHO.

Cheers,

James

Bump.