The problem with that is that elf “UUIDs” are nothing like the UUIDs from that RFC. In ELF world, there are actually two different mechanisms for representing identities of a module. Gdb has a very good description of how they work, and I’d recommend reading that if you want to know more details, but I’ll give a short summary here.
The first (newer) mechanism is called a “build ID”. This is really just a checksum of the binary taken at a particular point in time (before stripping). There is no predefined meaning given to any of the bits in the identifier (even the size of the ID depends on the choice of hashing function), and modules without a build ID simply do not have the relevant section. A zero build id is perfectly valid, albeit very unlikely. Apart from the size and treatment of zeroes we can use this pretty much like a mach-o UUID.
The second mechanism is called a “debug link”. The “uuid” component of that is a 32-bit CRC checksum (of the file containing the debug info). That might seem weak, but that’s because this checksum wasn’t really meant to be used for positively identifying the that file.The debug link also contains a where the debug info file is supposed to be found, and the CRC is there just to confirm the match. This searching mechanism is supposed to work differently than a regular UUID-based one (where we consider an identical UUID to be sufficient proof of a match), but I believe we are still putting this number into all the UUID fields in lldb. I don’t think that doing that is a good idea, but right now I’m not sure what it would take to change that. Due to the small size, it’s possible that someone somewhere might encounter a zero CRC32.
Windows (COFF) on the other hand, uses a UUID (it calls it GUID) as the one from the RFC, but attaches an extra “age” field to it – the idea being that the different builds of the same project will have the same GUID, but different ages. Currently, we’re just treating the GUID+age combination as one large UUID, and I think that should be sufficient for our current use cases, but one can imagine advanced uses where the knowledge of this structure might be useful. I haven’t checked, but I suspect both of the ELF formats can also appear in COFF files, when building with the GNU toolchain (but I don’t think LLDB supports that now).
Now if that wasn’t complicated enough, there are also os-independent formats (which LLDB supports) like minidump. This means they need to interoperate with all of the formats (and their UUIDs) above, but (like us), they don’t always get these details right. So you can, for instance, run into old minidump files, which contain an elf build-id in their UUID field, but the build id is truncated to 16 bytes, because the rest wouldn’t fit (just like lldb used to support only 16-byte UUIDs). Or you can have a minidump referencing a Mach-O file with an all-zero (invalid) UUID, even though that same UUID could theoretically be valid if it was referring to an ELF file.
So that’s the theory. Our handling of UUIDs is definitely not consistent, and it’s possible it may not even be done consistently. The fact that we’re using the term UUID to refer to all of these different identifiers definitely does not help.
I was the one who introduced this optional/non-optional duality, but I have come to doubt that decision. Treating zero elf build-ids as invalid still doesn’t feel right, but the chance of those occurring in practice is very small (though I wouldn’t want to be the one debugging that if it happens), and it’s very hard to ensure that all code handles this situation correctly (particularly when you take the code not under our control into account). When I was doing that, I did not realize what a mess will the minidump uuids will become, and the reason I kept the used the optional version in
SBModuleSpec::SetUUIDBytes is because I suspected that would break a lot of the existing use cases (but I didn’t want to add a new API just for that).
Overall, I guess I wouldn’t be opposed to going back to treating all zero “UUIDs” as invalid.