Target Specific Parsing API

Folks,

Following the discussion with Nico and others, I've created PR20683 to
discuss about the implementation of a generic and externalised target
specific parsing API for LLVM, Clang and others.

I have a vague plan involving a generic class (say TargetParser) in
lib/Target that is accessible as an API to any tool that needs target
specific parsing. The idea is then to let targets implement their own
versions of it on a generic part of the code (still lib/Target) so
that we don't break tools if we don't build every back-end (or any
back-end) with LLVM. Maybe this could be fixed on CMake, maybe not.

This class should also allow for customization from the part of the
tools, to either add functionality to existing functions (like
pre-parsing, post-parsing, de-mangling etc) or adding completely new
functions.

Also, since a common side effect of parsing architectural parameters
is to set flags in specific classes, we should expect that every tool
will *have* to override the "updateFlags" method inside the tool
(including LLVM's integrated assembler) to use their own
target-specific structures.

I'm hoping that this makes sense. Please let me know if there's any
major flaw, or existing infrastructure that we could be re-using. But
from the looks of target parsing in both LLVM and Clang, there
isn't... :confused:

cheers,
--renato

Renato,

Could you give a couple of examples where this would be useful? I find myself without the context to really understand your proposal. Are we discussing target specific language extensions? Assembly parsing? Something else entirely?

Philip

Hi Philip,

Sorry, the bug track (two, for now) should have more context, but
here's a summary:

Nico started trying to solve a problem where ".fpu neon" wouldn't
change the instruction set in an assembly file, and he found that he
needed a parser identical to the one in Clang to parse the exact same
semantics ("neon", "vfpv3", etc). The ARM assembly also has a .cpu
which is pretty much the same story, so he thought that we could share
the parser on both sides, maybe exposing some functionality from LLVM
to Clang.

The problem, as Reid mentioned, is that LLVM doesn't need to compile
with all back-ends, and implementation in a back-end that is not
compiled will generate link time errors on a statically compiled Clang
or run time errors on a dynamically compiled Clang. But this also
opens a can of worms, where we start to leak target specific knowledge
from the back door, without a properly defined API that has to be
respected over the years, once everyone else has forgotten that we've
done that.

I blocked such a patch from going in, because we'd be replacing code
duplication with implicit coupling of far away pieces of code, but now
that the original problem is solved (by duplicating code), we have to
fix the duplication problem. Since duplication was already there,
especially in the sub-arch parsing, we should be able to sweep a few
of other similar bugs away with a single fix.

Since parsing of strings is generic, and should be used by all tools
(that support -mfpu), that piece of code can live in lit/Target and
Clang can rest assured that it'll be there. But the second part, the
Clang-specific one, will only live in Clang, and use Clang's own
structures to hold and change the sub-architecture feature flags
wherever it's needed. Same for all other tools, and same for MC
assembler.

So, as an example in the assembler case, parseDirective sees ".fpu",
calls parseDirectiveFPU which uses ArchParser->parseFPU() returning an
enum/bitfield owned by ArchParser, which is then relayed to the
assembler so it can call setAvailableFeatures(FPU);

In Clang, the driver will observe -mfpu and call
ArchParser->parseFPU() which, again, will return the same
enum/bitfield to the driver, which will then update its own flags to
-cc1, etc.

We can use enum/bitfields as a communication method, re-using most of
what's in use right now to identify those things, but move into one
single place. That would be the quick and simple solution. If it turns
out we need some more complex fiddling, we might have to create some
call-backs (virtual preParse(), virtual postParse() that do nothing on
the base class) etc, so that Clang can override them and do what's
needed, but that's only if we can't do it straight with enums.

cheers,
--renato

Note that it is not just option parsing. It is about the bits of
information about a target that we expect to be available even when
that target is not compiled.

It includes target options like -mfpu, but should also include things
like creating the DataLayout string which is currently duplicated in
clang.

A slightly different option would be to just require the llvm targets
to implement a virtual interface, but that would mean that something
as basic as

clang -target armv7-pc-linux -### -c test.c

would fail if the llvm arm backend was not compiled.

Note that it is not just option parsing. It is about the bits of
information about a target that we expect to be available even when
that target is not compiled.

Exactly, this is why the sub-arch information needs to be
tool-specific, but the parsing (even of the DataLayout) doesn't, since
this is identical to all tools.

Either returning agreed enum values, or letting the implementation
override some methods or even using template policies would do the
trick. We just need to find the simplest implementation for this case.

A slightly different option would be to just require the llvm targets
to implement a virtual interface, but that would mean that something
as basic as

clang -target armv7-pc-linux -### -c test.c

would fail if the llvm arm backend was not compiled.

Oh, so there's why I think the bit setting needs to be tool-specific.
Clang has its own way of setting the arch bits, and this ArchParser
doesn't know anything about it, nor it should.

Let me give it a try, completely untested...

In LLVM: lib/Target/ArchParser.cpp:
ArchParser {
    parseFPU() override;
    parseCPU() override;
    ...
    setFPUBits() override;
    setCPUBits() override;
  }
  ARMArchParser : public ArchParser {
     parseFPU() { ... }
     parseCPU() { ... }
    ...
  }
  X86ArchParser : ...

In Clang: lib/Driver/ArchParser.cpp:
  template <class BaseArchParser>
  ClangArchParser : public BaseArchParser {
    setFPUBits() { ... };
    setCPUBits() { ... };
    ...
  }
ArchParser GetArchParser(StringRef TargetName) { if ("ARM") return
ClangArchParser<ARMArchParser>(); ... }

in llc, lli, lld, integrated assemblers, do like Clang.

ClangArchParser's setFPU will have nothing from the ARM back-end in
it, because its bits will be clang-specific. I know this still keeps
ARM knowledge in Clang, but it moves into a specific area that other
parts of Clang can access, and will help us leave the Clang-specific
sub-arch knowledge in Clang, and ARM specific option parsing in LLVM.

Currently, the behaviour is to allow for all options to work on -###,
including all back-ends that aren't compiled, and that's how Clang
tests behave. To change that would need a major change in the tests.
If we really want to soft-fail -### and relatives when a back end is
not compiled, we'll have to find a solution for it in addition to
change all the tests. But that's step 2.

cheers,
--renato