Configure script breakage with the new -Werror=implicit-function-declaration

Hello,

We’ve noticed some worrying problems resulting from the late change in Clang that causes implicit function declarations to be errors by default. It seems that many configure (e.g. autoconf) scripts were silently relying on the old behavior, and the change is causing them to misfire.

I fundamentally agree with the reasoning behind the change, and I don’t mind regular code failing to compile because of implicit function declarations. Unfortunately, in case of configure scripts this often goes unnoticed as the compiler output is not printed normally. What’s worse, it does not necessarily result in software failing to compile but could cause it to misdetect system features instead, resulting in missing features or wrong behavior.

To list just a few packages that we’ve already found broken in different ways due to this:

I think that’s just a tip of the iceberg because often the only visible difference will be some configure check yielding no instead of yes, so finding all the cases would require a tremendous effort of building everything with and without this behavior and comparing configure results, and even then it won’t be 100% proof.

Please also note that we’re dealing with some software whose old (i.e. unfixed) versions will be relevant for a very long time.

While I wholeheartedly agree with working towards fixing these cases to use proper code, I don’t think making the warning an error by default is the right way to do so. This effectively shifts down the burden to regular users who are going to hit these surprising problems first hand, and have to patch these packages to have a working system.

3 Likes

This is indeed worrying, what are your thoughts @AaronBallman @jyknight @rjmccall

Yep, with FreeBSD ports we see basically the same issue, as I mentioned in 265425 – [exp-run] Against llvm-15-update branch on GitHub :

I would like to solicit some feedback about a few huge (in my opinion at least) behavior changes in clang 15, as compared to clang 14:

  1. In https://github.com/llvm/llvm-project/commit/7d644e1215b376ec5e915df9ea2eeb56e2d94626 (“[C11/C2x] Change the behavior of the implicit function declaration warning”), the -Wimplicit-function-declaration ‘warning’ became an error for C99 and later.

  2. In https://github.com/llvm/llvm-project/commit/2cb2cd242ca08d0bbd2a51a41f1317442e5414fc (“Change the behavior of implicit int diagnostics”), the -Wimplicit-int ‘warning’ became an error for C99 and later.

Of all the errors in Antoine’s logs, -Wimplicit-function-declaration is more than half, and roughly 16% are -Wimplicit-int errors:

% grep error: *.log > errors.txt
% wc -l errors.txt
3257 errors.txt
% grep Wimplicit-function-declaration errors.txt | wc -l
1848
% grep Wimplicit-int errors.txt | wc -l
525

I looked through a few ports, and while some of these can be ‘fixed’ by adding USE_CSTD=gnu89 or USE_CSTD=c89, lots of them either don’t respect the USE_CSTD= setting or fail to pick up CFLAGS.

So I’m wondering whether it is worthwhile to partially revert upstream commits 1) and 2), turning the errors back into warnings again.

This would temporarily save us, until C2x becomes the default language dialect, but at some point this technical debt must paid: e.g. all ports that use “old-style” C (K&R, C89 without prototypes, implicit int etc etc) should explicitly be marked such, or they have to be patched to use an older compiler.

What’s the general opinion on this in the ports team?

(Btw, going through all these ports with a fine toothed comb is probably the most future-proof, but I simply do not have the time for it.)

So many of these ports either do not build (that is at least a clear indicator of problems!), but there are also quite a lot that will ‘silently’ disable features because of autoconf mis-detection. Some of the programs can be made to work correctly by forcing -std=gnu89, but it’s not always easy to figure out.

I think that working it around via -Wno-error=implicit-function-declaration is safer than forcing -std but still, I wouldn’t rely on all configure checks reliably respecting CFLAGS.

1 Like

That’s very unfortunate and I’m sorry for the trouble caused! Thank you for raising the issue.

[quote=“mgorny, post:1, topic:65213”]
While I wholeheartedly agree with working towards fixing these cases to use proper code, I don’t think making the warning an error by default is the right way to do so.[/quote]

Despite the breakage, I’m not certain I agree. At some point, distro maintainers need to allow implementations to recapture design space that has been an error for 20+ years and help to improve the ecosystem’s security posture. I’m very sympathetic to the lack of a deprecation period between C89 and C99 basically requiring implementations to give a lot of grace to older code bases, but at some point we need to bite the bullet and maintain these packages or drop them. 20 years is a LOT of grace period. Defaulting the warning to an error is basically the only way forward IMO because everyone who was going to react to a warning has already done so. (NB: we intend to eventually go to a hard error due to the security concerns which caused this feature to be removed from C99 in the first place, so kicking the can down the road isn’t going to help.)

If it forces people to maintain code that’s important to them, it’s also a win IMO. :stuck_out_tongue: That said, we want to limit the disruption as much as we can.

This was already shipped in Clang 15, so the horses have left the barn (this was very much not a regression and so I don’t think it would be appropriate to change the behavior further in 15.0.x – that just makes the landscape even more confusing and difficult for users). However, if distros want to revert that change locally to build their packages, that’s perfectly reasonable. But it doesn’t solve the long-term problem, either.

I generally agree.

In terms of where we go from here, I see a few paths:

  1. We could attempt to work around the configure script nonsense by recognizing when the compiler is executed by a configure script and run in a special “we’re dealing with an extremely low quality configure script” mode (and or/providing a flag so users can force this mode). This would keep configure scripts working while not punishing the rest of the ecosystem. I have no idea how easy or hard it would be, but I believe GCC is doing something similar and so it seems at least plausible.
  2. We could downgrade back to a warning in Clang 16. I don’t think this is a good option. Not only do we end up with inconsistent behavior from release to release, it’s plain worse security posture to assume that the linker will magically make everything work, especially with today’s significantly more complicated linking landscape compared to what existed in 1989. Since we poked the bear with Clang 15, I think we should stay the course rather than retreat.
  3. Do nothing. This leaves distro maintainers in a tough spot (but perhaps no tougher than the spot we’ve already put them in) and makes it harder for folks to build ancient software packages themselves. This may push some folks away from using Clang. It may also start to cause some folks to maintain the unmaintained software, at least to keep it hobbling around.

Perhaps there are other ways forward as well? Personally, I think #1 would be ideal if someone has the ability to do that work, but that may be a big ask.

At least, for a temporary solution, distro maintainers can either build those old packages with Clang 14 or can back the diagnostic changes out of their local Clang 15 branch, so am I correct in understanding we don’t have to rush for a fix here?

1 Like

I get your point but I still think this could have gone better. I mean, if I look at clang 15.0.0 release notes, this is just a tiny point in middle of “bug fixes”. I’m pretty sure this caught a lot of people by surprise. Agreed, even if I did notice this change beforehand, I wouldn’t have thought it would be breaking configure scripts. Nevertheless, we know that now and IMHO this shifts this from “minor diagnostic change” to a “big breaking change”.

I don’t claim that giving people more time would suddenly change the world. Nevertheless, just the few last days prove that people do care. If you asked Gentoo users to test a change like this, I’m sure a fair number of them would actually do that, watch for issues (and I mean really watch for them, not only because something broke in a visible way) and get a fair number fixed before it went live.

I think there’s still time to revert this, especially with the new, faster release schedule. 15.0.0 was just a few days ago, a quick 15.0.1 with a revert wouldn’t hurt IMHO. Sure, it could cause a little confusion but it would also give an opportunity to emphasize the problem and get people better prepared for when it actually happens (e.g. in Clang 16). Again, we now know that it’s not just a matter of some projects failing to build but of software silently getting miscompiled in unpredictable ways.

That said, I don’t think actually going from error to warning could actually break something. I mean, even if it could, people have to account for older Clang versions and other compilers that don’t treat this as an error.

Gentoo still defaults to GCC, and it seems that it’s going to remain like that for a long time. Nevertheless, some of our users choose to switch their compiler to clang, and so far we’ve tried to provide a reasonably good support for it. I don’t really want to make major changes in behavior contrary to upstream clang development, so I’m afraid the best I can do is to warn users more and discourage them from using clang until things settle a bit.

This sounds like a very bad idea, and one really prone to misfiring. I would strongly advise against that, especially that it could cause even more confusion, and hard-to-debug issues.

In my opinion, the best solution for the time being is to revert the change in 15.0.1 and do it better in a future release. I think there’s rush to make that decision, and if this idea were to be rejected, I’d really prefer that it was rejected based on a technical basis rather than stalling until it becomes too late to revert it.

Let me emphasize, this is really a big problem. It’s not just “horses out of the barn” problem. It’s “invisible cattle-eating monsters out of the barn”, and people are completely unprepared to deal with it right now. I mean, stuff’s being miscompiled as we speak, we (volunteers who are busy with a lot of other work, I should point out) are suddenly forced to figure out a good way to even detect the problem, not to mention patching hundreds, perhaps thousands of packages. And the worst part is, since it’s all so sudden we don’t even have an opportunity to coordinate this, so a lot of people will end up haphazardly patching the same things independently and carrying mebibytes of patches. And you hold the switch to recall the monsters back to the barn and give us time to prepare.

I agree, it’s not too late to revert for Clang 15 – getting it into 15.0.1 is likely sufficiently fast to ameliorate the problem.

Given widespread reports of breakage, I’m inclined to agree we ought to do so – at least for the 15.x release branch, and we can have further discussion on what to do for 16.x.

Especially if you (or anyone!) is willing to volunteer to do some distro-wide rebuilds to discover breakages from this issue, and send patches to the upstream developers to fix it over the next 6 months, I really see no downside from delaying implementation until the Clang 16 release.

However, I do worry that we might never be able to flip this in the default compilation mode (which is currently -std=gnu17). Since C23 mode does not support this misfeature, that would imply that we’d be unable to switch to -std=gnu23 as the default compilation mode, without reintroducing this same problem. I hope it doesn’t come to that.

1 Like

I’m actively working (mgorny started this discussion after I raised the alarm) on trying to get things fixed within Gentoo and upstreaming the fixes, but it’d take the stress off a lot if I knew the change was coming in 6 months or w/e but without users being at risk like they are now.

I would appreciate it if the change was strongly emphasised in the release notes and folks were advised to check their own configure scripts (both in the 15.0.1 revert & 16) to help awareness as well.

I still think there’s going to be a blast radius by doing this in LLVM 16, but it’s going to be a lot smaller than the current situation. For all the reasons stated above, whlie it might be tempted, I would not want it postponed indefinitely. But right now, I’ve not had a chance to build standard package sets and check them for problems, and nor have other distros (nor are they aware that they should), so this extra time buffer until LLVM 16 would help a lot.

If we are going to revert (and I think we should), I’d appreciate it if LLVM used its contacts with distributions to make them aware of the problem and keep an eye out too.

1 Like

Yeah, we ran an RFC for the strict prototypes changes, but not for implicit int or implicit function declarations. In hindsight, more visibility for this change would have been a good idea rather than just relying on release notes. So definitely agreed this could have gone better.

FWIW, it was never expected to be a minor diagnostic change. We knew going into it that folks would be caught by this. If we didn’t think anyone would be impacted, we’d have made it a hard error.

There’s only so much we can do in these situations to communicate upcoming changes. I should have done more in this case, like posting an RFC. But even then, “I don’t follow the forums” is just as plausible as “I don’t follow the release notes”, so I expect folks would still be caught out. This change was in tree since April, so there’s been approximately five months for people to have brought this concern up for critical situations. Additionally, there were several months of release candidates with which folks were welcome to test the compiler. We rely on people who have strong opinions about the compiler to either do testing of the tree as it progresses or of one of the release candidates.

Moving forward, I think we need to be proactive from both angles. We should communicate these changes better than just a blurb in release notes, but distro maintainers need to come to us before we release the compiler with concerns, not after. (Well, do come to us after! We still want to know about the issues and try to mitigate them. But once we’ve shipped, it’s a bit late to say “please don’t do this change”.)

We can certainly consider it, but that basically just dismisses my concern rather than addressing it. 15.0 is out the door. We can hope that the recent release hasn’t gone out to many people, but we have no way of knowing who has already grabbed it for use (distros may have packaged it up, but also corporations may have already downloaded the release to distribute internally, etc). So the chances for confusion will continue, and unless we pulled the 15.0 release from downloads, apt, etc, that confusion isn’t going to go away even if we react quickly.

The time for this request was before the compiler was released. However, we can probably still do something here to mitigate the situation.

Or encourage them to start fixing these broken ancient repos so that they’re valid C code.

The technical basis is: this isn’t a regression between Clang 14 and Clang 15; it’s an intentional change we made for two primary reasons:

  1. C2x forced our hand; we cannot support implicit function declarations in C2x and so we needed to make changes to that code.
  2. This functionality has not been valid C for over 20 years and the warning has existed since Clang was first released. Given how trivial it is to address the issue in isolation (declare the function, disable the error, disable the warning, use an old compiler), the fact that we knew this would break people was considered acceptable, especially given that the functionality causes security concerns.

Implicit function declarations are not the only such breaking change. We also made significant changes to functions without prototypes, implicit int, and we’re doing more of the same with -Wincompatible-function-pointer-types in Clang 16. There’s a theme here: we don’t wish to continue to support C compatibility features with known poor security posture. Not only is it just a bad practice in general, we’re stepping directly in the design space of the standards committees (and lest you think that’s hyperbole, we almost didn’t adopt the changes to make void f(); mean void f(void); over exactly this sort of situation where compilers happily let users write invalid code by default).

So will changes to implicit function declarations even be sufficient? Or do we also need to change implicit int and K&R C declarations?

“so sudden” depends heavily on perspective. These changes have been in the tree for over five months. People found issues and reported them and we addressed those issues during the release. While I can certainly see why this is a very big problem, it’s hard for me to accept an argument that it was sudden as though there was no opportunity to discover this.

All that said, I think we can live with changing Clang 15.0.1 to warn instead of err by default (so it’s not a revert of the work but does downgrade the problem). But I’d like to know how long you expect us to maintain it in that form – are we talking one release of Clang, or several releases of Clang, or never?

1 Like

You might consider using the CCC_OVERRIDE_OPTIONS environment variable to provide additional options to Clang in those difficult cases. I certainly wouldn’t recommend doing so on an extended basis, but you might find it helpful as a temporary measure.

Asked and answered?

Yup, our posts were both written at roughly the same time, so our streams got crossed.

The Clang 16 time frame sounds very reasonable to me. However, I’m still curious about the scope (are we also talking about implicit int or K&R C functions without prototypes? Do the function pointer conversion changes in Clang 16 have similar timing concerns?).

FreeBSD and Gentoo Linux may use a clang configuration file. Clang Driver: Simplify Gentoo gcc-config detection · Issue #57570 · llvm/llvm-project · GitHub

  • Install a configuration file specifying -Wno-error=implicit-function-declaration. Configuration file specified options don’t trigger -Wunused-command-line-argument warnings.
  • Configure sys-devel/clang with an appropriate CLANG_CONFIG_FILE_SYSTEM_DIR. Change clang to a shell script which invokes ${default_triple}-clang. This is because xxx-clang loads xxx.cfg while clang doesn’t.

At work I dealt with error-by-default -Wimplicit-function-declaration. I ended up adding a lot of -Wno-error=implicit-function-declaration to our build systems (Bazel) for a dozen third-party projects.

1 Like

I’ve seen many implicit function decl. issues, a handful of prototype bugs, and maybe two implicit int error in configure tests (I’m not counting codebases in this as I think that’s totally fair game - I’m just concerned with configure tests which require a fair amount of work to detect issues with for the reasons aforementioned).

If I had my way, I’d probably ask that all 3 categories get punted to Clang 16 to give us time to build all these packages & inspect results. But punting just implicit function decls. and prototypes would be enough to give some breathing room if looking for a compromise.

Thanks for the dialogue on this, all.

1 Like

But did you predict that it would cause invisible breakage for configure scripts? This is really what makes the big difference. If it just caused stuff to fail to compile or link, I wouldn’t have a problem with that. However, in this instance before we really start fixing stuff we actually need to develop tools to actually detect this kind of breakage (assuming that the call might not yield any immediate visible results), and this is not a trivial task to do.

I’m sorry but most of us are only volunteers. We need to somehow manage a full-time job and open source, including lot of other (and more rewarding) work than hunting this kind of bugs, and on top of it avoid burnout. I’ve heard that some people even manage to have real life, friends and family, that kind of stuff. I’m trying my best but I barely manage to get LLVM building against before the final release, and I don’t really think it would be fair to expose our users to LLVM prereleases.

You can also reasonably assume that these people will be upgrading to 15.0.1 and successive releases. It’s not like there was ever a “perfect” LLVM release that didn’t involve bugfixes, and people are prepared that they need to upgrade.

Unfortunately, realistically this means that there will be 10 one-commit forks where people independently fix one bug and leave the code to rot again. But that’s beside the point.

I’m entirely happy with delaying it just until 16. After all, we now know what the problem is and this makes all the difference.

I’m afraid I’m not the right person to answer these questions. I’ve given a few examples of what’s been reported so far. People have offered so far to work on better tooling to find changes in configure script results — when the tooling is ready, we should be able to test more. That is, if we know what to test.

3 Likes

With respect, considering the scope of the impact, potentially every autoconf script, and the nature of the issue, silent misbehavior, I feel that a line in the middle of the (clang-only, not LLVM) release notes is not very good notice. Considering that you agree that this change could potentially have major impacts, it seems to me that it should at the least be at the top of the clang release notes. I can’t seem to find any notice of this change on the forums either: the RFC [RFC] Enabling -Wstrict-prototypes by default in C discussed autoconf issues, but from my reading, it was agreed not to use -Werror by default. It’s unclear to me how a distributor would have been aware of this change to test the impacts in a release candidate unless they carefully track every commit message or release note.

edit: A good comparison can be made to when gcc 10 set -fno-common by default, which was highly disruptive to Linux distros because of the sheer range of the breakage. -Werror=implicit-function-declaration likely affects much more packages, and worse, as has been beaten to death but I feel I must reiterate, unlike -fno-common which was almost always visible breakage, -Werror=implicit-function-declaration often causes hidden breakage. To be clear, I think these are good changes, but they are very significant changes that deserve far more consideration and announcement than a Phab patch and single release line note.

This solution sounds effectively similar to simply patching the default back to -Wno-error=implicit-function-declaration, which would cause fragmentation, but I don’t see why it would be any more or less so.

1 Like

It is not within mine, or many distributors’ abilities, to watch every upstream project’s commit history for any possibly disruptive changes. We rely on release notes to emphasise things that are important.

Some venue where distributors can be asked to test particularly disruptive changes in advance might be a good idea. I also wonder: did any LLVM folks try rebuilding their system using Clang 15 with this change?

2 Likes

Moving forward, I don’t like the idea of just pushing it to clang 16 either. However, as has been discussed, I think there are reasonable actions to do in the meantime; it is not simply waiting and hoping that the problem fixes itself. These include:

  1. Modifying autoconf in some way, such as making it add -Wno-error=implicit-function-declaration itself, perhaps based on some conditions (e.g. AC_PREREQ <= 2.71).
  2. Modifying distro build systems in some way, such as adding -Wno-error=implicit-function-declaration to CC, or if 1 is done, forcing autoconf script regeneration in some cases. I am inclined to say that most autoconf scripts likely respect CC, since otherwise multilib would be totally broken, and in practice it usually at least sort of works. Alternatively, linting rules could be applied to check config.log for implicit-function-declaration and if detected, issue a warning which will be seen by the package maintainer.
  3. Modifying clang; it has been suggested, somewhat as a joke, to do if(!strcmp(filename, "conftest.c")) werror=0;. I think this idea as-is is perhaps not the best, but some modification of it might well be practical.

I still plan to do the necessary upstream enablement work for GCC, maybe in time for GCC 14, but I’m running behind. (Some core GNU components, including GCC itself, have already been fixed.) I tried to make a Clang-like change many years ago in Fedora, and the silent autoconf breakage was just too great. Clearly this hasn’t changed. Unfortunately, it’s not just autoconf, but there are other configuration/build systems with similar failure modes. Building twice and diffing config.log is not general enough.

We discussed the way forward at a GNU Tools Cauldron (session, slides). The idea is to log these errors in an impossible-to-suppress way, and fail builds if any such errors occurred, even if messages and compiler driver exit status have been suppressed. The preferred approach is to dump the errors into files in a magic directory and inspect its contents at the end of the build. It can be a bit of a hack because this compiler doesn’t have to be shipped. Likewise for some C2X language changes that have similar adverse impact on autconf-style scripts.

And then fix all the active upstream projects that generate the relevant errors, and exchange fixes among distributions for the inactive ones (or adjust compiler flags for them).

6 Likes

I think you’re being unrealistic. At least 1/2 the issues mgorny cited at the start of this thread were found after clang-15.0.0 was released and by a Gentoo user (me). To review the builds of the ~1,000 packages I have installed would take a significant amount of time. If I was feeling particularly optimistic, I’d give myself a year to review all of them. With that estimate, it would take almost 20 years for me to review review every package in Gentoo’s repository and a team ~40 people to do it within 6 months.

Source-based distros such as Gentoo are the exception rather than the norm, an if distros such as Gentoo are unable to find these types of problems before a compiler release, how likely is it that binary distros, who have far fewer people building packages from source, are going to find them?

The cost to perform an in-depth analysis of the impact a change like this makes within the preferred time-frame is just too high. In my opinion, the LLVM development team is going to have to accept that the fallout from these types of changes just isn’t going to be done until after a compiler release.

P.S. I spent about a week trying to use clang-15 as my system compiler. In that time, I rebuilt 500 packages and only thoroughly reviewed a handful of them. Once I found clang-15: May produce invalid code when -O1 (or higher) is used with -fzero-call-used-regs=all · Issue #57692 · llvm/llvm-project · GitHub, I decided that continuing to use clang-15 was too risk and abandoned the effort.

1 Like