Macros for revision number and commit hash

Hello macro enthusiasts!

By default, clang defines a lot of macros that allow developers to figure out things about the compiler that’s building them. I’m proposing that we add two more:

  1. clang_revision_number : an unsigned integer, representing the current branch’s revision number.
  2. clang_commit_hash : a string containing a hexadecimal hash, representing the current checkout’s hash.

These would be set by cmake when building LLVM, and make the most sense in a monorepo on git. There’s been discussion of clang revision number in the context of git migration, captured on the git migration page.

Here are examples of similar numbers we currently offer:

auto major = clang_major; // 7
auto minor = clang_minor; // 0
auto patch = clang_patchlevel; // 0
auto version = clang_version; // "7.0.0 (tags/RELEASE_700/final 342594)”
auto VERSION = VERSION; // "4.2.1 Compatible Clang 7.0.0 (tags/RELEASE_700/final 342594)”
auto gnu_major = GNUC; // 4
auto gnu_minor = GNUC_MINOR; // 2
auto gnu_patch = GNUC_PATCHLEVEL; // 1
auto gxx_abi = __GXX_ABI_VERSION; // 1002

The two new values will be generally useful to tools that automatically handle revisions. For example, we currently have bots that parse clang -v, and that output is unfortunately aimed at humans (bots aren’t human). That’s broken in the past, and having these two values would help reduce the likelihood of breakage because macros have a pretty stable format and are easy to test.

Let us know what you think!

JF

Could be useful sometimes I guess. More generally useful though would be vendor flavors of the clang macros. Many compilers are based on Clang but report their own version as well as the Clang version on which they are based, e.g. XYZ Clang 10 based on Clang 7. So __vendor_major__, __vendor_minor__, etc. Clang trunk would just default these to the clang values, but downstream compilers would make a small change.

-Troy

Hello macro enthusiasts!

By default, clang defines a lot of macros that allow developers to figure out things about the compiler that’s building them. I’m proposing that we add two more:

  1. clang_revision_number : an unsigned integer, representing the current branch’s revision number.
  2. clang_commit_hash : a string containing a hexadecimal hash, representing the current checkout’s hash.

These would be set by cmake when building LLVM, and make the most sense in a monorepo on git. There’s been discussion of clang revision number in the context of git migration, captured on the git migration page.

I’m generally +1 on exposing such metadata. The only downside I see is that a git pull could cause a relink of the entire world (even if otherwise unnecessary) just to update these fields. I suspect that this is going to be more painful in a monorepo world for people who work downstream of clang (e.g. LLDB developers).

Is there an implementation approach that allows us to get the benefit of this extra metadata without reducing dev productivity?

-Chris

Hello macro enthusiasts!

By default, clang defines a lot of macros that allow developers to figure out things about the compiler that’s building them. I’m proposing that we add two more:

__clang_revision_number__ : an unsigned integer, representing the current branch’s revision number.
__clang_commit_hash__ : a string containing a hexadecimal hash, representing the current checkout’s hash.

These would be set by cmake when building LLVM, and make the most sense in a monorepo on git. There’s been discussion of clang revision number in the context of git migration, captured on the git migration page.

I’m generally +1 on exposing such metadata. The only downside I see is that a git pull could cause a relink of the entire world (even if otherwise unnecessary) just to update these fields. I suspect that this is going to be more painful in a monorepo world for people who work downstream of clang (e.g. LLDB developers).

That will indeed cause relinking at best, or even recompilation at worst.

Is there an implementation approach that allows us to get the benefit of this extra metadata without reducing dev productivity?

The somewhat reasonable solution (apart from not doing this at all)
would be to build that
info into into *always*-shared library, so it would be the only one in
the need of relinking.

Though i do believe from practice (from doing that on another project)
that will still
likely require relinking of every other lib/bin that links against
that version-library.
Not sure if that can be fixed too, would love to know how if it can.

-Chris

Roman.

Hello macro enthusiasts!

By default, clang defines a lot of macros that allow developers to figure out things about the compiler that’s building them. I’m proposing that we add two more:

__clang_revision_number__ : an unsigned integer, representing the current branch’s revision number.
__clang_commit_hash__ : a string containing a hexadecimal hash, representing the current checkout’s hash.

These would be set by cmake when building LLVM, and make the most sense in a monorepo on git. There’s been discussion of clang revision number in the context of git migration, captured on the git migration page.

I’m generally +1 on exposing such metadata. The only downside I see is that a git pull could cause a relink of the entire world (even if otherwise unnecessary) just to update these fields. I suspect that this is going to be more painful in a monorepo world for people who work downstream of clang (e.g. LLDB developers).

That will indeed cause relinking at best, or even recompilation at worst.

Is there an implementation approach that allows us to get the benefit of this extra metadata without reducing dev productivity?

The somewhat reasonable solution (apart from not doing this at all)
would be to build that
info into into *always*-shared library, so it would be the only one in
the need of relinking.

Though i do believe from practice (from doing that on another project)
that will still
likely require relinking of every other lib/bin that links against
that version-library.
Not sure if that can be fixed too, would love to know how if it can.

Same problem __DATE__ has. In terms of “bad things can happen” this is pretty low IMO.

>
>>
>>
>>
>>
>> Hello macro enthusiasts!
>>
>> By default, clang defines a lot of macros that allow developers to figure out things about the compiler that’s building them. I’m proposing that we add two more:
>>
>> __clang_revision_number__ : an unsigned integer, representing the current branch’s revision number.
>> __clang_commit_hash__ : a string containing a hexadecimal hash, representing the current checkout’s hash.
>>
>> These would be set by cmake when building LLVM, and make the most sense in a monorepo on git. There’s been discussion of clang revision number in the context of git migration, captured on the git migration page.
>>
>>
>> I’m generally +1 on exposing such metadata. The only downside I see is that a git pull could cause a relink of the entire world (even if otherwise unnecessary) just to update these fields. I suspect that this is going to be more painful in a monorepo world for people who work downstream of clang (e.g. LLDB developers).
>
> That will indeed cause relinking at best, or even recompilation at worst.
>
>> Is there an implementation approach that allows us to get the benefit of this extra metadata without reducing dev productivity?
>
> The somewhat reasonable solution (apart from not doing this at all)
> would be to build that
> info into into *always*-shared library, so it would be the only one in
> the need of relinking.
>
> Though i do believe from practice (from doing that on another project)
> that will still
> likely require relinking of every other lib/bin that links against
> that version-library.
> Not sure if that can be fixed too, would love to know how if it can.

Same problem __DATE__ has. In terms of “bad things can happen” this is pretty low IMO.

Yes, but i'm not sure this is on the same scale of badness.

That macro should change once a day i think, at most?
One usually commits *much* more frequently,
from 1 commit/hour to 1 commit/5minutes i'd guesstimate.

>> -Chris
> Roman.

Roman.

Hello macro enthusiasts!

By default, clang defines a lot of macros that allow developers to figure out things about the compiler that’s building them. I’m proposing that we add two more:

__clang_revision_number__ : an unsigned integer, representing the current branch’s revision number.
__clang_commit_hash__ : a string containing a hexadecimal hash, representing the current checkout’s hash.

These would be set by cmake when building LLVM, and make the most sense in a monorepo on git. There’s been discussion of clang revision number in the context of git migration, captured on the git migration page.

I’m generally +1 on exposing such metadata. The only downside I see is that a git pull could cause a relink of the entire world (even if otherwise unnecessary) just to update these fields. I suspect that this is going to be more painful in a monorepo world for people who work downstream of clang (e.g. LLDB developers).

That will indeed cause relinking at best, or even recompilation at worst.

Is there an implementation approach that allows us to get the benefit of this extra metadata without reducing dev productivity?

The somewhat reasonable solution (apart from not doing this at all)
would be to build that
info into into *always*-shared library, so it would be the only one in
the need of relinking.

Though i do believe from practice (from doing that on another project)
that will still
likely require relinking of every other lib/bin that links against
that version-library.
Not sure if that can be fixed too, would love to know how if it can.

Same problem __DATE__ has. In terms of “bad things can happen” this is pretty low IMO.

Yes, but i'm not sure this is on the same scale of badness.

That macro should change once a day i think, at most?
One usually commits *much* more frequently,
from 1 commit/hour to 1 commit/5minutes i'd guesstimate.

You rebuild llvm and build your codebase using the new llvm that often?

IMO the main source of badness from __DATE__ is the non- reproducibility (for which we have a warning). Here it’s already non-reproducible because you’ve changed your compiler... I’d certainly expect a full rebuild when these new macros change: you changed the compiler. It’s a bug if it doesn’t cause a rebuild.

Hello macro enthusiasts!

By default, clang defines a lot of macros that allow developers to figure out things about the compiler that’s building them. I’m proposing that we add two more:

__clang_revision_number__ : an unsigned integer, representing the current branch’s revision number.
__clang_commit_hash__ : a string containing a hexadecimal hash, representing the current checkout’s hash.

These would be set by cmake when building LLVM, and make the most sense in a monorepo on git. There’s been discussion of clang revision number in the context of git migration, captured on the git migration page.

I’m generally +1 on exposing such metadata. The only downside I see is that a git pull could cause a relink of the entire world (even if otherwise unnecessary) just to update these fields. I suspect that this is going to be more painful in a monorepo world for people who work downstream of clang (e.g. LLDB developers).

That will indeed cause relinking at best, or even recompilation at worst.

Is there an implementation approach that allows us to get the benefit of this extra metadata without reducing dev productivity?

The somewhat reasonable solution (apart from not doing this at all)
would be to build that
info into into *always*-shared library, so it would be the only one in
the need of relinking.

Though i do believe from practice (from doing that on another project)
that will still
likely require relinking of every other lib/bin that links against
that version-library.
Not sure if that can be fixed too, would love to know how if it can.

Same problem __DATE__ has. In terms of “bad things can happen” this is pretty low IMO.

Yes, but i'm not sure this is on the same scale of badness.

That macro should change once a day i think, at most?
One usually commits *much* more frequently,
from 1 commit/hour to 1 commit/5minutes i'd guesstimate.

You rebuild llvm and build your codebase using the new llvm that often?

IMO the main source of badness from __DATE__ is the non- reproducibility (for which we have a warning). Here it’s already non-reproducible because you’ve changed your compiler... I’d certainly expect a full rebuild when these new macros change: you changed the compiler. It’s a bug if it doesn’t cause a rebuild.

Actually I get the concern now! You mean cmake changes cause all of llvm to rebuild, and that would trigger on a llvm dev’s machine! Not as user code.

Sorry I was slow to grok. I’ll try it out and see. Maybe incremental builds shouldn’t have the new commit? Or we might be able to limit the rebuild to Version.cpp. Or maybe the linker should do this? :man_shrugging:

I am not familiar with the Clang code base, but I faced a similar issue recently. We made sure to only trigger a single rebuild when getting a new hash by generating a small CPP file that contained the hash instead of sticking it in a header that was included everywhere. This would only rebuild the generated CPP and relink. Maybe another thing that can be considered is to not refresh the hash when in debug configuration? This would allow you to build/develop locally without rebuilds when you commit or pull.

Let me know if you want me to dig up my CMake incantation I used for this and I can post it to the list.

Thanks,
Tobias.

Why? Can’t this be stored in a text file that lives next to the clang binary, which is loaded iff the macro is referenced?

-Chris

This would basically undo https://reviews.llvm.org/D37272 which also caused lots of incremental build work after every sync (and technically, after every local commit). So if this got added, it should probably be opt-in and only be set for production builds, not for dev builds.

Since you can’t compare git hashes, when do you imagine would clang_commit_hash be useful?

Doesn’t clang_revision_number solve the same problem that __has_feature() and friends solve, only in a worse way?

In practice, I found all the clang_major / clang_minor etc built-ins to be useless because they’re gratuitously different in Xcode’s clang and there isn’t a vendor id define. I suppose that’s what Troy said above.

So overall maybe this isn’t fully thought through yet.

This would basically undo https://reviews.llvm.org/D37272 which also caused lots of incremental build work after every sync (and technically, after every local commit). So if this got added, it should probably be opt-in and only be set for production builds, not for dev builds.

Since you can't compare git hashes, when do you imagine would __clang_commit_hash__ be useful?

As mentioned upthread, I believe this is for automated equality
comparisons, mostly. You can compare git hashes for equality, so I
guess that's useful.

Doesn't __clang_revision_number__ solve the same problem that __has_feature() and friends solve, only in a worse way?

This one is less clear to me, especially given that we're phasing out
svn. What will this macro represent once we've fully switched away
from svn?

In practice, I found all the __clang_major__ / __clang_minor__ etc built-ins to be useless because they're gratuitously different in Xcode's clang and there isn't a vendor id define. I suppose that's what Troy said above.

+infinity (this causes considerable problems for our tools)

~Aaron

This would basically undo https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D37272&d=DwIGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=oOJsFtgkFCCcDlfLWKmfrVDtpXLihXJTBGJfSapXCV0&m=NN7eJZPEoSXTgtLbDft08spwp4-8SuqJAEb82D7VQ_g&s=xAZXMyOnpVwmiNzdVbq3IST7I4DrGtNBpcnyXJWSsAI&e= which also caused lots of incremental build work after every sync (and technically, after every local commit). So if this got added, it should probably be opt-in and only be set for production builds, not for dev builds.

Since you can't compare git hashes, when do you imagine would __clang_commit_hash__ be useful?

As mentioned upthread, I believe this is for automated equality
comparisons, mostly. You can compare git hashes for equality, so I
guess that's useful.

Doesn't __clang_revision_number__ solve the same problem that __has_feature() and friends solve, only in a worse way?

This one is less clear to me, especially given that we're phasing out
svn. What will this macro represent once we've fully switched away
from svn?

In practice, I found all the __clang_major__ / __clang_minor__ etc built-ins to be useless because they're gratuitously different in Xcode's clang and there isn't a vendor id define. I suppose that's what Troy said above.

+infinity (this causes considerable problems for our tools)

Likewise. We have to parse the output of 'clang --version' to determine
if the Clang version we're emulating is Apple Clang (or ARM Clang or
Android Clang or ...). It would be great to have a __clang_vendor__ or
similar macro that we could rely on instead.

Tom.

This would basically undo https://reviews.llvm.org/D37272 which also caused lots of incremental build work after every sync (and technically, after every local commit). So if this got added, it should probably be opt-in and only be set for production builds, not for dev builds.

I don’t think so. What you point out was in a header. I’d only put the two new definitions in Version.cpp.

This would basically undo https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D37272&d=DwIGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=oOJsFtgkFCCcDlfLWKmfrVDtpXLihXJTBGJfSapXCV0&m=NN7eJZPEoSXTgtLbDft08spwp4-8SuqJAEb82D7VQ_g&s=xAZXMyOnpVwmiNzdVbq3IST7I4DrGtNBpcnyXJWSsAI&e= which also caused lots of incremental build work after every sync (and technically, after every local commit). So if this got added, it should probably be opt-in and only be set for production builds, not for dev builds.

Since you can’t compare git hashes, when do you imagine would clang_commit_hash be useful?

As mentioned upthread, I believe this is for automated equality
comparisons, mostly. You can compare git hashes for equality, so I
guess that’s useful.

Correct, tools that automatically handle revisions. For example, we currently have bots that parse clang -v, and that output is unfortunately aimed at humans (bots aren’t human). That’s broken in the past.

Doesn’t clang_revision_number solve the same problem that __has_feature() and friends solve, only in a worse way?

This one is less clear to me, especially given that we’re phasing out
svn. What will this macro represent once we’ve fully switched away
from svn?

That’s captured on the git migration page.

It’s orthogonal to __has_feature, and indeed __has_feature solves the problems it solves more competently.

In practice, I found all the clang_major / clang_minor etc built-ins to be useless because they’re gratuitously different in Xcode’s clang and there isn’t a vendor id define. I suppose that’s what Troy said above.

+infinity (this causes considerable problems for our tools)

Likewise. We have to parse the output of ‘clang --version’ to determine
if the Clang version we’re emulating is Apple Clang (or ARM Clang or
Android Clang or …). It would be great to have a clang_vendor or
similar macro that we could rely on instead.

I’d support such a proposal, separate from this one.

Apple Clang does set the apple_build_version macro.

This seems mostly harmless – except that I’d be worried that people will actually use them, when they probably shouldn’t. :slight_smile:

The clang_revision_number value especially, being numeric, seems rather amenable for misuse. Having this as a standalone numeric value, I’d be worried that people would compare against a particular value, without realizing that it’s effectively a meaningless number without knowing more about the branch you’re talking about (which is rather non-trivial). At least the git hash is more-obviously not usable for ordering. But, exposing that also has some downsides, with lack of flexibility. If it’s specified that it must be just a hash, it cannot hold additional info, for example that your build contains uncommitted changes.

So, I guess it’s not entirely clear to me what useful purpose these values can be used for, for which clang_version isn’t already sufficient. (You mention that bots are parsing clang -v – but to what end?)

WRT how we parse clang -v, this is potentially an issue for anyone that uses LNT today. The code used to infer information about the compiler always tries to use clang -v.

You can see this here: https://github.com/llvm-mirror/lnt/blob/master/lnt/testing/util/compilers.py

Since this output is intended for human-readability, and not stable machine-consumption, this can cause some issues for people using LNT. For example, if there’s a bug in the human-readable output, or it changes in a way LNT doesn’t expect, you’ll always get a run order of 0. This can, in the worst case, result in an insignificant amount of data loss for regular testing. So, having at least a stable way to grab version information would be useful for some of the code we offer up in open source today.

  • Jessica

I don’t think the relinking that Chris and Nico brought up is a concern because we already embed the version control info in the binary by default. In other words, we already have this problem, and we already have a variable to disable it for developers: LLVM_APPEND_VC_REV. At least, that’s how things are supposed to work.

Separately, I don’t feel like we should add a pre-defined revision identifier macro to clang just because people find it hard to parse the -v output. Instead, we could resolve to make the formatting more stable, or add some --print-version flag that promises a stable format. Of course, this eventually invites a --print-version2 flag. :slight_smile: As we’re moving to git, the identifier won’t be numeric for very long. Surely that will break user assumptions about what our versions look like again, just as changing -v output would.

We should try to bring order to the XCode clang vs. open source clang version numbering, whatever that means. Those are numbers that users might actually want to compare in the pre-processor, so that they don’t have to go to these lengths to extract the version or revision from the -v output in the first place.

I don’t think the relinking that Chris and Nico brought up is a concern because we already embed the version control info in the binary by default. In other words, we already have this problem, and we already have a variable to disable it for developers: LLVM_APPEND_VC_REV. At least, that’s how things are supposed to work.

Separately, I don’t feel like we should add a pre-defined revision identifier macro to clang just because people find it hard to parse the -v output. Instead, we could resolve to make the formatting more stable, or add some --print-version flag that promises a stable format. Of course, this eventually invites a --print-version2 flag. :slight_smile:

Jessica and I initially discussed having a --version-json or some similar flag, which would output all that information in a structured way. That would also solve the specific bot issue, but I worry it won’t be as well maintained. I can do this instead if people aren’t comfortable with the macro (but do notice: many of the perceived build issues are the same with this approach).

As we’re moving to git, the identifier won’t be numeric for very long. Surely that will break user assumptions about what our versions look like again, just as changing -v output would.

From the link I provided: git rev-list --count

We should try to bring order to the XCode clang vs. open source clang version numbering, whatever that means. Those are numbers that users might actually want to compare in the pre-processor, so that they don’t have to go to these lengths to extract the version or revision from the -v output in the first place.

I’m happy to hear ideas on this, but do keep in mind that LLVM is packaged and patched independently by more than these two. It’s also not a “versus” situation :wink: