Are UUID's of all zero's valid UUID's anywhere?

lldb has two parallel sets of routines for creating UUID objects: setData vrs. setOptionalData, setFromStringRef and setFromOptionalStringRef. The difference between the two is that the Optional variant treats a UUID of all zeros as invalid - it actually clears the bytes so this is exactly the same as no UUID. Darwin treats a UUID of all zeros as invalid - it’s the linker’s way of providing a UUID for consistency’s sake but telling the tools not to treat it as a valid UUID.

The code in the MachO object file reader and the various Darwin DYLD plugins used these two inconsistently, so I’m fixing that up. The PDB version (handled by fromCvRecord) uses the Optional variant consistently (after doing some byte swapping.)

However, the ELF Object file reader consistently uses the non-optional version.

This matters because there are also a few places (in the cached module deserializer and in the ScriptedProcess among others) where we make UUID’s in generic code. In those cases, we don’t know a priori what to do.

If ELF is treating all 0 UUID’s as valid on purpose, then we’ll have to introduce a Platform/ObjectFile API like AreZeroUUIDsValid for this, so we can do it right in generic code. OTOH if this is just an oversight and actually all 0 UUID’s are invalid everywhere, we can remove the non-optional version (except where we need it in UUID) and switch make the Optional version be the public interface, which would be a lot cleaner.

I left a similar comment in a review somewhere, but according to the RFC the UUID with all bits set to zero is “special” and called the “Nil UUID”.

4.1.7. Nil UUID
The nil UUID is special form of UUID that is specified to have all 128 bits set to zero.

Apart from that, the RFC says that there’s no way to determine whether a UUID is valid:

Apart from determining whether the timestamp portion of the UUID is in the future and therefore not yet assignable, there is no mechanism for determining whether a UUID is ‘valid’.

I would argue that our meaning of a valid UUID doesn’t need to match that of the RFC and that it’s reasonable to treat the nil UUID as special and consider it “invalid” for the purposes of the debugger. At least empirically, there are many places where the nil UUID is used in a similar way.

I agree with Jonas, it does seem useful to have a “I had to put something here but don’t take it seriously” marker, and all zeros is the most reasonable value for that. Plus it would simplify the UUID implementation.

But it looked like the ELF code was using the non-zero-checking variant on purpose, so it would be nice to hear whether that was in fact necessary for some reason.

Note in the SBModuleSpec::SetUUIDBytes we were calling setOptionalData, so you can’t use that API to search for a Module that had a UUID of all 0’s. I don’t think the current treatment is particularly coherent.

I put up a patch that treats UUID’s of all zeros as globally invalid:

The problem with that is that elf “UUIDs” are nothing like the UUIDs from that RFC. In ELF world, there are actually two different mechanisms for representing identities of a module. Gdb has a very good description of how they work, and I’d recommend reading that if you want to know more details, but I’ll give a short summary here.

The first (newer) mechanism is called a “build ID”. This is really just a checksum of the binary taken at a particular point in time (before stripping). There is no predefined meaning given to any of the bits in the identifier (even the size of the ID depends on the choice of hashing function), and modules without a build ID simply do not have the relevant section. A zero build id is perfectly valid, albeit very unlikely. Apart from the size and treatment of zeroes we can use this pretty much like a mach-o UUID.

The second mechanism is called a “debug link”. The “uuid” component of that is a 32-bit CRC checksum (of the file containing the debug info). That might seem weak, but that’s because this checksum wasn’t really meant to be used for positively identifying the that file.The debug link also contains a where the debug info file is supposed to be found, and the CRC is there just to confirm the match. This searching mechanism is supposed to work differently than a regular UUID-based one (where we consider an identical UUID to be sufficient proof of a match), but I believe we are still putting this number into all the UUID fields in lldb. I don’t think that doing that is a good idea, but right now I’m not sure what it would take to change that. Due to the small size, it’s possible that someone somewhere might encounter a zero CRC32.

Windows (COFF) on the other hand, uses a UUID (it calls it GUID) as the one from the RFC, but attaches an extra “age” field to it – the idea being that the different builds of the same project will have the same GUID, but different ages. Currently, we’re just treating the GUID+age combination as one large UUID, and I think that should be sufficient for our current use cases, but one can imagine advanced uses where the knowledge of this structure might be useful. I haven’t checked, but I suspect both of the ELF formats can also appear in COFF files, when building with the GNU toolchain (but I don’t think LLDB supports that now).

Now if that wasn’t complicated enough, there are also os-independent formats (which LLDB supports) like minidump. This means they need to interoperate with all of the formats (and their UUIDs) above, but (like us), they don’t always get these details right. So you can, for instance, run into old minidump files, which contain an elf build-id in their UUID field, but the build id is truncated to 16 bytes, because the rest wouldn’t fit (just like lldb used to support only 16-byte UUIDs). Or you can have a minidump referencing a Mach-O file with an all-zero (invalid) UUID, even though that same UUID could theoretically be valid if it was referring to an ELF file.

So that’s the theory. Our handling of UUIDs is definitely not consistent, and it’s possible it may not even be done consistently. The fact that we’re using the term UUID to refer to all of these different identifiers definitely does not help.

I was the one who introduced this optional/non-optional duality, but I have come to doubt that decision. Treating zero elf build-ids as invalid still doesn’t feel right, but the chance of those occurring in practice is very small (though I wouldn’t want to be the one debugging that if it happens), and it’s very hard to ensure that all code handles this situation correctly (particularly when you take the code not under our control into account). When I was doing that, I did not realize what a mess will the minidump uuids will become, and the reason I kept the used the optional version in SBModuleSpec::SetUUIDBytes is because I suspected that would break a lot of the existing use cases (but I didn’t want to add a new API just for that).

Overall, I guess I wouldn’t be opposed to going back to treating all zero “UUIDs” as invalid.

From what Pavel said it looks like “How to treat all zero UUID’s” decision should be determined if you know the ObjectFile format that produced the UUID.

Does the Minidump format - though OS Independent - record the OS that produced it? That would be handy generally so I hope it does. The cached Modules are the other place we have to read in “OS Independent” formats, but we definitely know what the ObjectFile format is in that case.

So we should be able to do this “What does all zero’s mean” decision entirely correctly by first figuring out the ObjectFile format being described by the data we’re being handed, then ask that ObjectFile subclass what it means by a UUID of all zeros. To that end, we’d probably want to have the UUID constructor take the object file specifier as well as the data so they can ask it how zeros are meant to be handled.

But this seems like quite a bit of work for a pretty small payback.

If we do go that route, and I really hope we can avoid that, I’d rather pass in the policy (e.g. an enum value saying whether the nil UUID is valid) instead of the object file. It seems reasonable to require the caller of the constructor to figure that out beforehand.

Greg was okay with treating all zero’s as invalid (he accepted the review I mentioned). I’ll leave this over the weekend in case somebody thinks of some new objection, then if nobody has a strong objection I’ll apply that patch.