Diff to add ARMv6L to Target parser

Hi all.

I’ve been working with Swift on ARMv6 and v7. While working with ARMv6 on linux, I noticed that my arm architecture canonicalization code didn’t produce the expected result. The code that I had been using (within Swift’s Driver.cpp the following:

static llvm::Triple computeTargetTriple(StringRef DefaultTargetTriple) {
  llvm::Triple triple = llvm::Triple(DefaultTargetTriple);

  // Canonicalization of all armv6 sub architectures to armv6
  if (triple.getArch() == llvm::Triple::ArchType::arm) {
    if (triple.getSubArch() == llvm::Triple::SubArchType::ARMSubArch_v6 ||
        triple.getSubArch() == llvm::Triple::SubArchType::ARMSubArch_v6m ||
        triple.getSubArch() == llvm::Triple::SubArchType::ARMSubArch_v6k ||
        triple.getSubArch() == llvm::Triple::SubArchType::ARMSubArch_v6t2) {
      triple.setArchName("armv6");
    }
  }

  // Canonicalization of all armv7 sub architectures to armv7
  if (triple.getArch() == llvm::Triple::ArchType::arm) {
    if (triple.getSubArch() == llvm::Triple::SubArchType::ARMSubArch_v7 ||
        triple.getSubArch() == llvm::Triple::SubArchType::ARMSubArch_v7em ||
        triple.getSubArch() == llvm::Triple::SubArchType::ARMSubArch_v7m ||
        triple.getSubArch() == llvm::Triple::SubArchType::ARMSubArch_v7s ||
        triple.getSubArch() == llvm::Triple::SubArchType::ARMSubArch_v7k) {
      triple.setArchName("armv7");
    }
  }

  return triple;
}
However, because the DefaultTargetTriple is armv6l-unknown-linux-gnueabihf, and llvm didn’t know about v6l, it would fail to match and canonicalize to armv6. I added the notion of v6l to llvm to address this.

Thoughts/comments?
- Will

llvm-diff.txt (1.11 KB)

Hi Will,

ARMv6l was definitely there once. I'm not sure what happened.

I'm copying the ARM folks that did most of the recent changes in hope
they can shed some light.

cheers,
--renato

However, because the DefaultTargetTriple is armv6l-unknown-linux-gnueabihf,
and llvm didn’t know about v6l, it would fail to match and canonicalize to armv6.
I added the notion of v6l to llvm to address this.

ARMv6l was definitely there once. I'm not sure what happened.

I'm copying the ARM folks that did most of the recent changes in hope they can
shed some light.

Going back through SVN history, I cannot find any evidence that ARMv6L ever existed.

There used to be ARMv6HL, until r252903 changed it from an architecture variant into an alias for ARMv6K; i.e. ARMv6HL can still be used in a target triple.

For a reference, http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Support/TargetParser.cpp?revision=238651 is the last version of TargetParser before any of our changes, i.e. authored purely by Renato.
One can see it has v6HL but not v6L.

Oh, my bad!! I was thinking of ARMv7l... :confused:

New year's first brain fart. Sorry about that.

Nevertheless, I'll leave you guys to review this one, as I lost touch
with the parser a while ago.

cheers,
--renato

Going back through SVN history, I cannot find any evidence that ARMv6L ever existed.

Oh, my bad!! I was thinking of ARMv7l... :confused:

Nevertheless, I'll leave you guys to review this one, as I lost touch with the parser a while ago.

Ah, I see: ARMv7L is now an alias for ARMv7A.

So, if William has to add support for ARMv6L, I'd suggest he adds it as an alias, and not as an architecture variant.

However, because the DefaultTargetTriple is armv6l-unknown-linux-gnueabihf

Where is this "armv6l" coming from? Is it a legacy GNU thing? Or is it a newly-coined name?

Going back through SVN history, I cannot find any evidence that ARMv6L ever existed.

Oh, my bad!! I was thinking of ARMv7l… :confused:

Nevertheless, I’ll leave you guys to review this one, as I lost touch with the parser a while ago.

Ah, I see: ARMv7L is now an alias for ARMv7A.

Do you have a reference to where it’s defined as an alias? I agree that would be the better approach. I grep’d the source tree for ‘armv7l’ (case insensitive) and it doesn’t shed any light on the subject:

wdillon@tegra-ubuntu:~/apple-llvm$ grep -RIi “armv7l” *
include/llvm/Support/ARMTargetParser.def:ARM_ARCH(“armv7l”, AK_ARMV7L, “7-L”, “v7l”, ARMBuildAttrs::CPUArch::v7,
include/llvm/Support/ARMTargetParser.def:ARM_CPU_NAME(“cortex-a8”, AK_ARMV7L, FK_NEON, true, AEK_SEC)
lib/Support/Triple.cpp: case ARM::AK_ARMV7L:
lib/Support/TargetParser.cpp: case ARM::AK_ARMV7L:
lib/Support/TargetParser.cpp: case ARM::AK_ARMV7L:
test/CodeGen/ARM/legalize-unaligned-load.ll:; RUN: llc -O3 -code-model=default -relocation-model=default -mtriple=armv7l-unknown-linux-gnueabihf -mcpu=generic %s -o - | FileCheck %s
test/CodeGen/ARM/struct-byval-frame-index.ll:target triple = “armv7l-unknown-linux-gnueabihf”
unittests/ADT/TripleTest.cpp: llvm::Triple Triple("armv7l-linux-gnueabihf”);

Other than the unit test (should I be thinking about adding a unit test?) and codegen these occurrences are what I modeled the ARMv6 patch to match.

So, if William has to add support for ARMv6L, I’d suggest he adds it as an alias, and not as an architecture variant.

However, because the DefaultTargetTriple is armv6l-unknown-linux-gnueabihf

Where is this “armv6l” coming from? Is it a legacy GNU thing? Or is it a newly-coined name?

the armv6l is coming from linux.

wdillon@raspberrypi:~ $ uname -a
Linux raspberrypi 4.1.13+ #826 PREEMPT Fri Nov 13 20:13:22 GMT 2015 armv6l GNU/Linux

The default target triple comes from the logic in the swift compiler. The use case is to canonicalize the arm architecture part of the triple. Because there is so much logic in llvm to make the correct decisions, it makes sense to me to let llvm do the work of decoding the arm architecture, rather than trying to hack something together within swift.

Thanks for the feedback! :slight_smile:

  • Will

Ah, I see: ARMv7L is now an alias for ARMv7A.

Do you have a reference to where it’s defined as an alias?

Yes, see getArchSynonym in TargetParser.cpp

Other than the unit test (should I be thinking about adding a unit test?)

Yes, certainly: that's the only way to ensure that armv6l wouldn't be washed away by future refactorings.

Where is this "armv6l" coming from? Is it a legacy GNU thing? Or is it a newly-coined name?

the armv6l is coming from linux.

wdillon@raspberrypi:~ $ uname -a
Linux raspberrypi 4.1.13+ #826 PREEMPT Fri Nov 13 20:13:22 GMT 2015 armv6l GNU/Linux

OK, fair enough.

The "armv7l" comes directly from the Linux kernel I believe: "armv7" +
"little-endian" (arch/arm/kernel/setup.c:638 ish). I'm not entirely
convinced the "l" belongs in any triple seen by LLVM, but if we do
support it we should probably handle it analogously to the "el"/"eb"
suffixes.

Cheers.

Tim.

Hi,

IMO we should support this, even though if given the option I’d have asked the linux guys not to invent a new triple. It’s in linux now, and uname -a is a very standard way of obtaining the host’s triple.

James

We do support v7l and we should support v6l. I though we did that
already, since RaspberyPi has been supported for years.

Anyhow, Artyom's proposal is the best, IMO, to treat it like an alias
and handle like v7A internally. If we end up needing specific
decisions in the driver, it should stay in the driver.

cheers,
--renato

That's rather a hack, given that the 'l' actually has semantic
meaning, but I suppose I could live with it.

Tim.

Not really.

Today, all platform decisions are in the Triple class, which is known
to be insufficient.

Until the Tuple class replaces the Triple for such uses, we'll have to
continue on the same implementation, which is to set up a triple and a
lot of other independent flags which convey the original idea.

In the Arch side, ARMv7l is *really* ARMv7A, so it should not have any
further knowledge about the rest of the environment.

Ideally, the driver should set up a Tuple object with all the
decisions, using the TargetParser *only* to map from text to enum, and
using those values to get a complete picture, also based on the
environment, etc.

Once Tuple holds all the info, we can have other enum values to mean
all other environmental options, but none of them should be in the
Arch portion.

cheers,
--renato

That's rather a hack, given that the 'l' actually has semantic
meaning, but I suppose I could live with it.

Not really.

I disagree. "armv7l" is created specifically by Linux appending a
little-endian 'l' to an "armv7" base. It's much less common (because
no-one cares about big-endian), but "armv7b" also exists. As do
"armv6l", "armv6b" and probably "armv8" equivalents too if you ran a
32-bit kernel there. Adding random aliasing on an ad-hoc basis doesn't
scale or represent what's really going on.

In the Arch side, ARMv7l is *really* ARMv7A, so it should not have any
further knowledge about the rest of the environment.

What does RTLinux on an R-class CPU present itself as? My guess would
be "armv7l" too. I'm not too concerned about deciding that we only
care about A-class Linux, but I am concerned about taking an
unprincipled approach to the l/b suffix.

Not enough that I can't live with it though. As you say this whole
area's enough of a mess already, one more bit isn't going to break the
world.

Cheers.

Tim.

I disagree. "armv7l" is created specifically by Linux appending a
little-endian 'l' to an "armv7" base. It's much less common (because
no-one cares about big-endian), but "armv7b" also exists.

You assume triples make sense. That's the first mistake everyone does
when thinking about triples. :slight_smile:

Triples are only random strings that mean whatever the original author
wants it to mean. "ARMv7L", so far, has been used to generically
represent "ARMv7A little endian".

AFAIK, "ARMv7B" is only used by HighBank, which is no more. But that,
too, was "ARMv7A big endian".

As do
"armv6l", "armv6b" and probably "armv8" equivalents too if you ran a
32-bit kernel there. Adding random aliasing on an ad-hoc basis doesn't
scale or represent what's really going on.

That is not random, and that's what you're missing. Those are
"Architecture Names", not triples, not sub-architecture, not
environment markers, not vendors choices.

The ARM documents define a list of official architecture names, and
those are the ones supported by the TargetParser. What the
Triple/Tuple supports is completely orthogonal.

So, in the TargetParser, there can be no "ARMv7L" nor "ARMv7B", only
"ARMv7A". The other strings map to ARMv7A because that's what they
generally mean, in the wild, with regards to "architecture names". The
hack is that the Driver/Triple/Tuple are using that to mean everything
else.

The Parser also extracts the sub-architecture string ("v7l" etc), so
the Triple/Tuple classes can reason some more about it on their own,
to change the other sub-architecture and vendor specific options. Some
places do that, others don't.

So, this patch doesn't fix the whole thing, because the fix is *not*
in the TargetParser, but in the Driver, the Triple, the Tuple, etc.
The Parser suggestion by Artyom is correct, the fix for the problem is
not.

What does RTLinux on an R-class CPU present itself as? My guess would
be "armv7l" too.

Nope. GNU triples have a very specific meaning because that's what's
generated from the configure-time options. That means different
choices generate different triples, otherwise, you wouldn't be able to
build packages or sysroots for different environments (multiarch /
multilib). Therefore, different sub-architecture levels end up with
different triple names.

Furthermore, distributions choose different options for the same
triples, and the same options for different triples, and they like it
so much that they're pushing me to add CMake-time default options into
LLVM to simulate that. So, in the end, "ARMv7L" means completely
different things for different Linux distributions. This is *really* a
job for the Tuple, not a simple parser that relies on Table-Gen'd ARCH
names.

cheers,
--renato

You assume triples make sense. That's the first mistake everyone does
when thinking about triples. :slight_smile:

I know they don't make sense in many corner cases, but I think
discarding logic where it *does* exist is a mistake.

AFAIK, "ARMv7B" is only used by HighBank, which is no more. But that,
too, was "ARMv7A big endian".

I believe it's what any Linux kernel will present itself as via "uname
-a" which, as James said, is a vaguely sane way to seed a triple.

The ARM documents define a list of official architecture names, and
those are the ones supported by the TargetParser. What the
Triple/Tuple supports is completely orthogonal.

I do know this, Renato.

So, in the TargetParser, there can be no "ARMv7L" nor "ARMv7B", only
"ARMv7A". The other strings map to ARMv7A because that's what they
generally mean, in the wild, with regards to "architecture names". The
hack is that the Driver/Triple/Tuple are using that to mean everything
else.

Sure. In exactly the same way there's no "armebv7" in TargetParser.
There's principled(ish) parsing of the "eb" when deciding what the
triple actually means though.

What does RTLinux on an R-class CPU present itself as? My guess would
be "armv7l" too.

Nope. GNU triples have a very specific meaning because that's what's
generated from the configure-time options.

I have no idea what you're objecting to here. I wasn't saying anything
about how GNU interprets the armv7l that would be produced, just that
I would expect running "uname" on an R-class Linux to produce armv7l.

Tim.

I believe it's what any Linux kernel will present itself as via "uname
-a" which, as James said, is a vaguely sane way to seed a triple.

Well, not necessarily sane, but almost certainly something people will
do anyway.

Tim.

I know they don't make sense in many corner cases, but I think
discarding logic where it *does* exist is a mistake.

We're not discarding any logic. As I said, the architecture name
cannot be ARMv7L for any reason. It's that simple.

The logic, if it exists, need to be encoded somewhere else.

I do know this, Renato.

I only meant as an opener, not as education. I know you know this...

I have no idea what you're objecting to here. I wasn't saying anything
about how GNU interprets the armv7l that would be produced, just that
I would expect running "uname" on an R-class Linux to produce armv7l.

I cannot separate between A and R if both use the same arch name.
Knowing the sub-arch is as important than knowing its byte-sex, and I
can't trade one for the other.

If the Driver/Triple/Tuple knows which, depending on other
information, then they should be the ones encoding this.

Adding an *Arch Name* ARMv7L will make no progress in understanding
what it is. Encoding the correct fields in the Triple/Tuple will, and
for that, ARMv7A or ARMv7R are the *correct* "arch names".

cheers,
--renato

Hi all,

Thank you so much for this discussion. It has been very helpful and educational. I think that I understand the perspectives of both Tim and Renato. Let me briefly summarize them to ensure that I correctly understand:

Renato’s perspective is that the triple means whatever the author says it does, and that may or not be meaningful. :slight_smile: In the case of armv7l (for example) it has a clear meaning: ARMv7A little endian. Unfortunately, this nomenclature collides with what is used for sub-architectures. ARMv7l, however, is NOT a sub-architecture, and should not be treated as one.

Tim’s perspective is that what Renato says isn’t wrong, but that while the logic currently used to map armv7l into something meaningful is not great, it is certainly not unacceptable. Given that armv7l is already treated in this way, it is not unreasonable to also support armv6l.

How is this for a proposed solution? What if I add logic to llvm::ARM::getCanonicalArchName() that matches the trailing ‘l’ (or ‘b’) and treats it the same way that ‘el’ and ‘eb’ are now? This function would return v7 or v6 in the case of armv7l or armv6l, respectively (or, if it’s preferred, v7a or v6a). It seems like this is closer to the “correct” approach. It does show potentially inappropriate deference to Linux’s choice in the matter, but at least it’s not making sub-architectures that don’t exist.

Thoughts?

  • Will

This is the final "better" solution, but it can't be implemented right
now due to current expectations of what "armv7l" and "armv6l" mean.

If you return "v7" from "armv7l", it'll set the arch to generic ARMv7
which has none of the current ARMv7A attributes that makes code runs
*much* faster. That would be a major regression in performance.

The quick hack is to leave "armv7l" as it is, add "armv6l" to the
expected "armv6?" variant that means RaspberryPi, and add a number of
FIXMEs to the parser:
1. To remove both L/B variants as aliases
2. To implement L/B in getCanonical in the same way as EB/BE
3. But only once Triple/Tuple can replace the "v7l" -> "v7a when
Linux" OR "v7R when RTLinux" logic (I have no idea how this will pan
out).

To know about the progress on Tuple, check with Daniel Sanders and
Eric Christopher (CC'd).

Adding this hack to the Driver now could make the mess even worse at this stage.

cheers,
--renato

Taking the suggestions of the group under consideration, I’ve generated a new diff. The thing to note is that armv6l is now treated identically to armv6hl. I’ve also added a unit test.
This seems to me to be the least invasive method, and holds to existing conventions as closely as possible.

Thoughts?

llvm-armv6l.diff (2.15 KB)