Target-specific defaults for LLVM tools?

Hello, LLVM.

We’d like to start a discussion about what would be the best way of
supporting target-specific defaults for options in the LLVM tools.

Problem description

Devil’s advocate: opt, llc, lli, etc. are development/debugging tools for LLVM developers, not for end users, and the project optimizes their functionality for that use case.

—Owen

What does the community think?
Discuss. :slight_smile:

Devil’s advocate: opt, llc, lli, etc. are development/debugging tools for LLVM developers, not for end users, and the project optimizes their functionality for that use case.

+1 :slight_smile:

-eric

However, issues start to pop up when we consider all the other tools (opt,
llc,
and so on) and the complications of having for example a "cross-llc". Usage
of
such tools is becoming increasingly likely thanks to the existence of other
tools and frontends which can generate LLVM IR. Some crazy people might even
try
to write IR instead of C++! :slight_smile:

llc / opt and similar tools are intended for LLVM developer use and
should never be exposed to end-user. Therefore, things like
"cross-llc" and similar do not make any sense.

If, for some particular weird reason, there is need for e.g. 'opt'
functionality to be provided to users, then a new tool with clear
command line should be developed implementing the necessary
functionality.

Currently, there is a non-trivial number of LLVM tests that do not set the full
triple, but only parts of it, and then rely on specific default code generation
options.

In most cases this should be considered as a bug - the test should be
precise and therefore ought to be fixed.

To elaborate more here:

a) It’s probably a bug in the test, however,
b) It may be that the test is considered generic enough to be for all x86 and there’s something else going a little weird.

There’s enough b that it’s still probably a case by case basis.

-eric

Hello, LLVM.

We'd like to start a discussion about what would be the best way of
supporting target-specific defaults for options in the LLVM tools.

Problem description
-------------------

LLVM strives to be a generic compiler and it has a lot of code generation
options which are (rightly) not linked to the target triple.
However, some target triples do impose hard requirements for such options.
For
example, PS4 requires the relocation model to be PIC. Cross-compilers for
specific target systems are likely to have similar restrictions.

One very effective way to ensure that users of Clang/LLVM don't misuse the
tools by passing the wrong options is to set the default values for those
options in the cross-compiler once the target triple is known. This has the
following rationale:

- Correctness: the user must be able to generate correct code for the
target
  in the default workflow (with or without optimizations).
- Performance: the user should be able to generate good code for the target
  in the default workflow with optimizations enabled.
- Usability: although the default workflow could potentially be defined as
  "sorry, but you must pass options X, Y and Z to the tools", obviously it
would
  be easier if it did not require such esoteric options.

And of course, from a company's point of view:

- Commercial reasons: if a tool is released commercially, the user expects
such
  commercial product to work "out of the box" for the target system.

This is very easy to do in Clang, as the driver does the job of
"dispatching"
the correct command line options to subsequent invocations of tools.
Because of
the driver being the "middle man", it can parse the options, understand
what the
target triple is, and invoke other tools accordingly (i.e. actively adding
an
option, or erroring out upon encountering an invalid set of options).

A vendor can set the default target triple for Clang even at build time,
and
this approach seems to not cause any trouble in the Clang tests (which
don't get
as far as checking the output assembly). So for Clang the problem is
solved to a
certain degree.

However, issues start to pop up when we consider all the other tools (opt,
llc,
and so on) and the complications of having for example a "cross-llc".
Usage of
such tools is becoming increasingly likely thanks to the existence of other
tools and frontends which can generate LLVM IR. Some crazy people might
even try
to write IR instead of C++! :slight_smile:

First of all, overriding the defaults based on the triple is harder in LLVM
tools, because of how the cl::opt options work. If you consider llc, it
determines the triple in this sequence:

- Parse the IR and extract the triple if present.
- If there's a triple on the command line, override the IR triple.
- If no triple was found by either method, use the default triple.

This process is performed in the compileModule() function, which happens
way
after the cl::init initializers for the cl::opt option defaults, and
shortly
after the cl::ParseCommandLineOptions() call. If the value of some option
needs
to be changed _after_ the triple has been determined, additional work has
to be
performed to tweak the cl::opt options on the fly, but only if they had not
already been set on the command line (which would mean the user explicitly
wanted a non-default value). It would be quite ugly, although doable.

The real problem, however, is that tools are used not only by the end
user, but
also extensively by the LLVM tests that check the output assembly.

You have this completely backwards, and then some. The use case for these
tools for testing is not "also". That is the only current use case that
they are developed for. I'm not sure where you got the impression that they
are meant to be used by end users, but that is just completely, flat-out
incorrect.

-- Sean Silva

Not to pile on the “LLVM tools are for debugging” bandwagon too hard here, but I’m pretty sure the clang driver can be used on source, IR, and disassembly. People shouldn’t be using llc and opt, instead they should just pass the IR files to clang.

Then clang can set the right target-specific defaults based on the clang flags.

That seems like a much more reasonable approach to the problem to me.

-Chris

What is the preferred method for compiler (frontend) developers to optimize and generate target machine code from IR?

At one point I found a tutorial that recommended simply dumping the IR to a file and spawning llc to do the job.

Up until now I have “manually” created a TargetMachine, PassManager, etc. to generate my object code. The initial version of my code was cribbed from llc for LLVM 3.2, and has since been updated for 3.5.1.

However, with every new release of LLVM, the API to the backend optimization and code generation passes changes. The changes from 3.5.1 to 3.6 are quite significant. If I’m lucky, the impact of a change is that my C++ code refuses to compile, and I have to fix. If I’m unlucky, the impact may be that my code compiles and runs, but LLVM works suboptimally - perhaps some optimizations don’t happen. (This can happen if a newer API expects me to do some step which was not required in earlier releases, and there is no assert to catch it.)

As an alternative, I am seriously considering “simply dumping the IR to a file and spawning llc” to perform my backend work. The API to create IR is much more stable than the API to do useful things with it. Furthermore, it’s a lot easier to manually debug IR that has been dumped to a file. Finally, I can spawn multiple, independent, concurrent invocations of llc on a multi-core machine. I needn’t worry about concurrency, as the standard Linux fork/waitpid type calls will suffice. Given that 90% of my runtime is spent inside LLVM, I get 90% of the benefit of a fully concurrent design with almost zero work.

But now it seems that this usage model is frowned upon.

What is the recommended usage model?

Doesn't count for much, but I'm sympathetic to Dario's concern with
cl::opt and clang defaults. In one large out-of-tree project,
colleagues decided to create their own driver to handle the required
customization of the build process and options.

If clang is the user's one stop shop, here's one such cross compiler
example I'd like to understand better: GNU binutils has no clue of my
target. Instead of trying to use gas, how should GNU-less targets
enable clang invoke llvm-mc to assemble?

Doesn't count for much, but I'm sympathetic to Dario's concern with
cl::opt and clang defaults. In one large out-of-tree project,
colleagues decided to create their own driver to handle the required
customization of the build process and options.

If clang is the user's one stop shop, here's one such cross compiler
example I'd like to understand better: GNU binutils has no clue of my
target. Instead of trying to use gas, how should GNU-less targets
enable clang invoke llvm-mc to assemble?

-fintegrated-as / -fno-integrated-as

If clang is the user's one stop shop, here's one such cross compiler
example I'd like to understand better: GNU binutils has no clue of my
target. Instead of trying to use gas, how should GNU-less targets
enable clang invoke llvm-mc to assemble?

-fintegrated-as / -fno-integrated-as

I tried, but integrated-as was impractical due to too many possible
machine code matches for a given assembly statement. So, the compiler
deals in pseudos and emits .s files. If the integrated assembler
could accept an assembly string instead of an mcinst, I'd be in
business.

Regardless, users could pass .s files on the command line and Clang
would still need to call llvm-mc with the right options.

If clang is the user's one stop shop, here's one such cross compiler
example I'd like to understand better: GNU binutils has no clue of my
target. Instead of trying to use gas, how should GNU-less targets
enable clang invoke llvm-mc to assemble?

-fintegrated-as / -fno-integrated-as

I tried, but integrated-as was impractical due to too many possible
machine code matches for a given assembly statement. So, the compiler
deals in pseudos and emits .s files. If the integrated assembler
could accept an assembly string instead of an mcinst, I'd be in
business.

Which arch is this?

Regardless, users could pass .s files on the command line and Clang
would still need to call llvm-mc with the right options.

That's sort of what `clang -cc1as` does already when you do: `clang foo.s -integrated-as`.

Clang already does this without invoking llvm-mc. It invokes 'clang -cc1as'
which performs the same function. Try it. :slight_smile:

You don't make sense to me. IAS uses the same parser as llvm-mc.

Joerg

Clang correctly assembled a .s file when using the -integrated-as
option. Thanks, that is news to me! However, without -integrated-as,
clang invoked 'as', which won't work for a GNU-less target. Accepting
assembler invocation as a good case study for target specific clang
defaults, what is the proper solution?

Look for hasIntegratedAssembler and IsIntegratedAssemblerDefault.

Joerg

Again, what is the target architecture? (I understand if it's an out-of-tree backend that you can't talk about. If so, I'll stop bugging you about this point.)

For some targets, -integrated-as the default is on, for others it defaults to off. It generally depends on how good the integrated assembler is for that target. For example, the default on ARM used to be -no-integrated-as because IAS just wasn't mature enough at the time.

The "proper solution" here w.r.t. assemblers is to implement full support for the integrated assembler for the target arch in llvm, and then flip the default for that target.

Thanks for all the replies!

Whoa, it looks like there’s pretty much a massive consensus on “use clang and never use opt/llc in that kind of scenario”.

I appreciate that opt and llc are mainly debugging/testing tools.
The problem is mainly that these programs “are there” in the open source build of LLVM. Users know that they’re getting a product based on open source LLVM, so:

  • If opt and llc were not shipped, then it would be reasonable for users to ask: “why can’t we have them?”

  • If they were shipped, then it would be reasonable for users to ask: “can we have them just work?”

Also, intuitively it feels that llc could be more lightweight / performing compared to clang (which has to spawn a second process), although I admit that I don’t have performance numbers for this comparison (yet).

Anyway, now we definitely have an answer for the two questions above. :slight_smile:

Cheers,

Dario Domizioli

SN Systems - Sony Computer Entertainment Group

Right, sorry to skip over your question. It is an out-of-tree target
I can't be open about. I will start a new thread on the integrated-as
problems in my target.

Sorry to be a bit controversial, but as a follow-up from this thread, should we then explicitly document that opt, llc and lli are debugging tools in the LLVM documentation?

The LLVM User Guide main page says that “The documentation here is intended for users who have a need to work with the intermediate LLVM representation.”
So yes, it is for advanced users, but this wording appears to include users having their own IR generators (they are technically “working with the intermediate representation”).

The official man pages for llc, opt and lli certainly make them look like standalone and usable tools:
http://llvm.org/docs/CommandGuide/llc.html

http://llvm.org/docs/CommandGuide/opt.html

http://llvm.org/docs/CommandGuide/lli.html

I can’t find anything explicit about them only being debugging/testing tools, nor anything explicitly discouraging their use in production.

Even the “Getting Started with the LLVM System” documentation uses llc in the example!
http://llvm.org/docs/GettingStarted.html#an-example-using-the-llvm-tool-chain

If llc is just a debugging/testing tool, why does the documentation tell beginners (the obvious target of a “getting started” document) about its existence?!

It could be argued that advanced users “should know better”, but the documentation could be clearer too.

Cheers,
Dario Domizioli
SN Systems - Sony Computer Entertainment Group