When should ArchSpecs match?

I was puzzled by the behavior of ArchSpec::IsExactMatch() and IsCompatibleMatch() yesterday, so I created a couple of unit tests to document the current behavior. Most of the tests make perfect sense, but a few edge cases really don't behave like I would have expected them to.

  {
    ArchSpec A("arm64-*-*");
    ArchSpec B("arm64-apple-ios");
    ASSERT_FALSE(A.IsExactMatch(B));
    // FIXME: This looks unintuitive and we should investigate whether
    // this is the desired behavior.
    ASSERT_FALSE(A.IsCompatibleMatch(B));
  }
  {
    ArchSpec A("x86_64-*-*");
    ArchSpec B("x86_64-apple-ios-simulator");
    ASSERT_FALSE(A.IsExactMatch(B));
    // FIXME: See above, though the extra environment complicates things.
    ASSERT_FALSE(A.IsCompatibleMatch(B));
  }
  {
    ArchSpec A("x86_64");
    ArchSpec B("x86_64-apple-macosx10.14");
    // FIXME: The exact match also looks unintuitive.
    ASSERT_TRUE(A.IsExactMatch(B));
    ASSERT_TRUE(A.IsCompatibleMatch(B));
  }

Particularly, I believe that:
- ArchSpec("x86_64-*-*") and ArchSpec("x86_64") should behave the same.
- ArchSpec("x86_64").IsExactMatch("x86_64-apple-macosx10.14") should be false.
- ArchSpec("x86_64-*-*").IsCompatibleMath("x86_64-apple-macosx") should be true.

Does anyone disagree with any of these statements?

I fully understand that changing any of these behaviors will undoubtedly break one or the other edge case, but I think it would be important to build on a foundation that actually makes sense if we want to be able to reason about the architecture matching logic at all.

let me know what you think!
-- adrian

I was puzzled by the behavior of ArchSpec::IsExactMatch() and IsCompatibleMatch() yesterday, so I created a couple of unit tests to document the current behavior. Most of the tests make perfect sense, but a few edge cases really don't behave like I would have expected them to.

> {
> ArchSpec A("arm64-*-*");
> ArchSpec B("arm64-apple-ios");
> ASSERT_FALSE(A.IsExactMatch(B));
> // FIXME: This looks unintuitive and we should investigate whether
> // this is the desired behavior.
> ASSERT_FALSE(A.IsCompatibleMatch(B));
> }
> {
> ArchSpec A("x86_64-*-*");
> ArchSpec B("x86_64-apple-ios-simulator");
> ASSERT_FALSE(A.IsExactMatch(B));
> // FIXME: See above, though the extra environment complicates things.
> ASSERT_FALSE(A.IsCompatibleMatch(B));
> }
> {
> ArchSpec A("x86_64");
> ArchSpec B("x86_64-apple-macosx10.14");
> // FIXME: The exact match also looks unintuitive.
> ASSERT_TRUE(A.IsExactMatch(B));
> ASSERT_TRUE(A.IsCompatibleMatch(B));
> }
>

Particularly, I believe that:
- ArchSpec("x86_64-*-*") and ArchSpec("x86_64") should behave the same.

Yes.

- ArchSpec("x86_64").IsExactMatch("x86_64-apple-macosx10.14") should be false.

I'm at a loss trying to understand how this could not be false.
It would mean that when we instantiate a Triple and we don't have an
arch we believe it's a Mach-O binary.
This sounds really wrong, and if there's code in lldb living under the
assumption this is true, we might consider getting rid of it (or
fixing it).

- ArchSpec("x86_64-*-*").IsCompatibleMath("x86_64-apple-macosx") should be true.

Yes.

I agree with Davide. Particularly if there’s code that is relying on the “IsExactMatch” not behaving like the function name makes clear it obviously should behave, we should straighten that out. Otherwise reasoning about this will be too confusing.

Jim

I think the confusing thing is when "unspecified" means "there is no OS" or "there is no vendor" versus "vendor/OS is unspecified".

Imagine debugging a firmware environment where we have a cpu arch, and we may have a vendor, but we specifically do not have an OS. Say armv7-apple-none (I make up "none", I don't think that's valid). If lldb is looking for a binary and it finds one with armv7-apple-ios, it should reject that binary, they are incompatible.

As opposed to a triple of "armv7-*-*" saying "I know this is an armv7 system target, but I don't know anything about the vendor or the OS" in which case an armv7-apple-ios binary is compatible.

My naive reading of "arm64-*-*" means vendor & OS are unspecified and should match anything.

My naive reading of "arm64" is that it is the same as "arm64-*-*".

I don't know what a triple string looks like where we specify "none" for a field. Is it armv7-apple-- ? I know Triple has Unknown enums, but "Unknown" is ambiguous between "I don't know it yet" versus "It not any Vendor/OS".

Some of the confusion is the textual representation of the triples, some of it is the llvm Triple class not having a way to express (afaik) "do not match this field against anything" aka "none".

Is there some reason we can’t define vendors, environments, arches, and oses for all supported use cases? That way “there is no os” would not ever be a thing.

There is genuinely no OS in some cases, like people who debug the software that runs in a keyboard or a mouse. And to higher-level coprocessors in a modern phones; the SOCs on all these devices have a cluster of processors, and only some of them are running an identifiable operating system, like iOS or Android.

I'll be honest, it's not often that we'll be debugging an arm64-apple-none target and have to decide whether an arm64-apple-ios binary should be loaded or not. But we need some way to express this kind of environment.

That’s what I mean though, perhaps we could add a value to the OSType enumeration like BareMetal or None to explicitly represent this. the SubArchType enum has NoSubArch, so it’s not without precedent. As long as you can express it in the triple format, the problem goes away.

Oh sorry I missed that. Yes, I think a value added to the OSType for NoOS or something would work. We need to standardize on a textual representation for this in a triple string as well, like 'none'. Then with arm64-- and arm64-*-* as UnknownVendor + UnknownOS we can have these marked as "compatible" with any other value in the case Adrian is looking at.

Sounds good to me.

As another data point, it is usually impossible to tell from looking at an ELF file which os it is intended to run on. You can tell the architecture because it's right in the elf header, but that's about it. Some OSs get around this by adding a special section like .this.is.an.android.binary, but not all of them. So in general, we need to be able to say "I have no idea which OS is this binary intended for".

pl

We can already say that with OSType::Unknown. That’s different than “i know that no OS exists”

We use 2 triples for Hexagon:

hexagon-unknown-elf (which becomes hexagon-unknown-unknown-elf internally), and hexagon-unknown-linux.

We follow the Linux standard and add in magic to the elf to identify it as a Linux binary. But in the hexagon-unknown-elf case we have no way to distinguish between standalone (no OS, running on our simulator) or QuRT (proprietary OS, could be running on hardware or simulator). In fact, the same shared library that has no OS calls (just standard library calls that go into the appropriate .so) could run under either one.

I think requiring a value for every OS would be a non-starter for us.

“Unknown” is a perfectly fine value for the os though, and I’m not suggesting to change that.

My point is simply that Jason’s situation (baremetal) is one that is not even expressible by the Triple syntax. As long as there’s some enum value that describes the situation (of which unknown is a valid choice), the problem goes away.

“Unknown” is a perfectly fine value for the os though, and I’m not suggesting to change that.

My point is simply that Jason’s situation (baremetal) is one that is not even expressible by the Triple syntax. As long as there’s some enum value that describes the situation (of which unknown is a valid choice), the problem goes away.

We current use a “specified unknown” (where enum and string are unknown) to mean “none”, which is what we use to say specify bare metal (no OS). I am happy to change that though. If we change this, then a few people’s workflows might have to change where they used to say “armv7-apple-unknown” to “armv7-apple-none”. Not a big deal since not many people are using LLDB for bare board debugging right now, but something we will need to document.

Greg

Hexagon uses “hexagon-unknown-elf” as its triple when running standalone (no OS) or with QuRT (our embedded OS), which expands to “hexagon-unknown-unknown-elf” sometimes, or “hexagon-unknown–elf” other times. For Linux we use “hexagon-unknown-linux”.

One issue I’ve seen is the Linux platform will match against “hexagon-unknown–elf”, so I need to make sure the Hexagon platform is in the plugin list before the Linux platform.

Ted