llvm bug 11944

Hello,
I took glimpse of PR llvm and or clang has ie lib cpp compound constructing global vars which makes objects get constructed in random order and with nonzero startup cpu/time overhead.

In one of rather mighty embedded projects for 4g enodeb and later 5g base stations we had similar issue. It was forbidden to complex construct global vars.

We could fancy clang syntax checker having option to early detect and track those so we could now how many are still left if any. I would disencourage running it on testsuite though as there were many false positives reported there.

Happy if that helps any.

Best regards,
Pawel Kunio

[It might be helpful if you used a few more words & fully explained
what you're referring to - I'm having a hard time following your
emails in this abbreviated writing style]

You looked at an existing bug (do you have a link to it, or bug number
on the bugs.llvm.org database) related to the use of global
constructors in the LLVM codebase? Or in another codebase using LLVM?

You had similar issues on another codebase you worked on where you
weren't allowed to use any non-trivial global constructors?

& you're proposing creating a checker or other tool to help find these
cases? Or proposing using an existing tool for finding such things?

Clang does have a warning for this already, I believe:
-Wglobal-constructors. But, yes, the LLVM codebase isn't remotely
ready for that and it's not been a high enough priority for anyone to
really clean it up - mostly because the main use of global
constructors is in the LLVM command line argument handling code - so
it's a non-trivial design/redesign/refactoring effort to figure out
the right new design for that and make all the changes necessary to
migrate to such a design. (after that there'd probably be a bunch of
smaller more incremental changes to cleanup global constructors and
get the codebase to have no -Wglobal-constructors warnings, then we
could turn on the warning to ensure we didn't regress)

- Dave

Hello,
Sorry about my bad english and wrecked communications line. I meant 11944 bug from llvm database. As on our system, we had similar issue in sense on embedded project around lte and 5g base stations. Archs finally decided global compound constructors are forbidden due to startup overhead and call order unpredictability.

Thank You for mentioning warning enabling flag. That should help me pinpoint all interesting uses of this construct.

I will try to lookup how these global constructors are done in llvmlib. What i saw as general guideline along which commandline opts component is built i generally like. I dont see yet fully where is issue in moving it to runtime. Id think we could try to use some pattern like commandline opts tables from gcc but nit sure yet whether keep much in globals or store within specific apps/tools classes.

In one of first phases id think of checking buildflags of clang what targets etc are enabled and register all supported commands and maybe prescan command line for further flags which enable or disable other flags.

I will try to build llvm lib with this warning enabled to see all opts uses and learn the api etc. I need to understand furst completely based on what logic macros and flags which flags blocks should be registered and later kept or disabled if we need to scan commandline in multiple passes etc.

Ill try to learn all i can first from code and gathered warnings and if questions arise ill ring back with them and if not ill try to propose some solution after fully testing it on current testsuite and all targets etc.

Best regards,
Pawel Kunio

pt., 23.04.2021, 02:30 użytkownik David Blaikie <dblaikie@gmail.com> napisał:

Hello,
Sorry about my bad english and wrecked communications line. I meant 11944 bug from llvm database. As on our system, we had similar issue in sense on embedded project around lte and 5g base stations. Archs finally decided global compound constructors are forbidden due to startup overhead and call order unpredictability.

Thank You for mentioning warning enabling flag. That should help me pinpoint all interesting uses of this construct.

Sure thing - oh, the other thing you might find interesting, regarding
the issues with global constructor ordering problems - there's a
dynamic tool to help find those:
https://github.com/google/sanitizers/wiki/AddressSanitizerInitializationOrderFiasco

I will try to lookup how these global constructors are done in llvmlib. What i saw as general guideline along which commandline opts component is built i generally like. I dont see yet fully where is issue in moving it to runtime. Id think we could try to use some pattern like commandline opts tables from gcc but nit sure yet whether keep much in globals or store within specific apps/tools classes.

In one of first phases id think of checking buildflags of clang what targets etc are enabled and register all supported commands and maybe prescan command line for further flags which enable or disable other flags.

I will try to build llvm lib with this warning enabled to see all opts uses and learn the api etc. I need to understand furst completely based on what logic macros and flags which flags blocks should be registered and later kept or disabled if we need to scan commandline in multiple passes etc.

I will say I wouldn't readily suggest this as the /best/ place to
start in LLVM - if you're interested in getting into the project
generally, without any particular preference for which part - given
this has been thought about a fair bit by folks throughout the
project's life, the amount of design work/time involved by core
developers will probably be non-trivial to figure out the right path
before the mechanical work can begin. Fair warning.

Usually I suggest starting with clang warnings (or
clang-format/clang-tidy these days) bugs, since they can be fairly
narrow/sometimes relatively isolated to fix. Source location bugs or
fixits (warnings that point to the wrong part of the code, or errors
that don't offer a fixit hint when they could) can be interesting
learning experiences about the Clang Abstract Syntax Tree/source
location information.

But if you're more interested in LLVM proper - probably middle-end
misoptimizations (probably easier to fix bugs/miscompiles than fixing
missed optimizations - the latter are probably more likely to be
deeply involved work/changes to make an optimization more powerful for
some reason).

Ill try to learn all i can first from code and gathered warnings and if questions arise ill ring back with them and if not ill try to propose some solution after fully testing it on current testsuite and all targets etc.

Yep, always happy to answer questions/provide pointers.

- Dave

Hello,
Thank You for kind friendly warning. Just to get it right. It might be too tough or too pointless for someone to integrate it? As of from ones you proposed i should either stick to warning point misreporting bugs or something on analyzer or tidy. Id say complex algos connected with optimizations skipping opportunity etc I could try to catch them and measure or benchmark them or maybe write tests for but fir hardcore c++ development i could need some time to get back in full shape.

I was thinking of another side project that should be doable. I could try to get the snapshot of libc api and try to parse it and use it as config script to what kind of issues look for in analyzer. Thus emulating planned c++ feature of postconditions preconditions checker.
If i reget my rolodex, i might be able to get in touch with few whitehats who could provide data what kind of holes via which api calls are most common and generally what kind of attacks misusing api are possible if that would help on this one.

Best regards,
Pawel Kunio

pt., 23.04.2021, 04:59 użytkownik David Blaikie <dblaikie@gmail.com> napisał:

Hello,
Thank You for kind friendly warning. Just to get it right. It might be too tough or too pointless for someone to integrate it?

Yeah, my rough guess is that it's a challenging problem - one that
many LLVM developers care about (many uses of LLVM as a library where
the use of global constructors adds to the application's startup time)
and have tried to fix to varying degrees at various times, but haven't
managed it.

As of from ones you proposed i should either stick to warning point misreporting bugs or something on analyzer or tidy. Id say complex algos connected with optimizations skipping opportunity etc I could try to catch them and measure or benchmark them or maybe write tests for but fir hardcore c++ development i could need some time to get back in full shape.

Actually middle end optimizations can sometimes be some of the simpler
code to fix/play with - so I wouldn't rule out playing with/modifying
them, if it interests you.

I was thinking of another side project that should be doable. I could try to get the snapshot of libc api and try to parse it and use it as config script to what kind of issues look for in analyzer.

Not sure I'm following here - are you referring to GNU libc, or LLVM's
recent libc project? & which analyzer?

Clang's Static Analyzer ( https://clang-analyzer.llvm.org/ ) - if
you're talking about that, and thinking of finding bugs /in/ the
analyzer (rather than using the analyzer to find bugs in other code)
by running it on some existing codebase (like libc, GNU or LLVM's) and
reporting what you find - yes, that could be useful. (though realize
there's probably lots of already known bugs, so sometimes bug finding
isn't the bottleneck/where there's a problem)

Thus emulating planned c++ feature of postconditions preconditions checker.
If i reget my rolodex, i might be able to get in touch with few whitehats who could provide data what kind of holes via which api calls are most common and generally what kind of attacks misusing api are possible if that would help on this one.

Yeah, if there are new static analysis checks you've got a use for,
could try implementing them as either Clang warnings, clang-tidy
checks, or Clang Static Analyzer checks.

- Dave

Hello,
To clarify on stdlib/libc. I like way ms has annotations in comments of their stdlib/libc. It is used by compiler as config script to something similar to static analyzer/warningsmodule both release and debug and as base for genning runtime checks in debug only.

It helps to catch suspicious or wrong calls to stdlib/libc.

If You were interested in something similar, id be willing to help here. You wouldnt have to write a new subcheker/subanalyzer per libc function.

Best regards,
Pawel Kunio

pt., 23.04.2021, 21:11 użytkownik David Blaikie <dblaikie@gmail.com> napisał:

Hello,
To clarify on stdlib/libc.

For what it's worth, there are several implementations of libc - GNU
libc and LLVM libc (well, the latter is relatively new/incomplete)
would be

I like way ms has annotations in comments of their stdlib/libc.

Ah, you're talking about SAL? (
https://docs.microsoft.com/en-us/cpp/c-runtime-library/sal-annotations?view=msvc-160
)

It is used by compiler as config script to something similar to static analyzer/warningsmodule both release and debug and as base for genning runtime checks in debug only.

It helps to catch suspicious or wrong calls to stdlib/libc.

If You were interested in something similar, id be willing to help here. You wouldnt have to write a new subcheker/subanalyzer per libc function.

Yeah, there's some general attribute annotations (like for
printf-style functions, for instance). Not sure if there are
attributes for general array+arraybound parameters, for instance.

- Dave

Hello,
Yep I meant sal. Id think we coud have printfy sallike annottation added to llvm libc at least for emulation of planned c++ extension about precond postcond constraints that need to hold for calls to lib not to give warnings or not to err. We could try to catch common risks if not errors/holes early via such a way.

Not sure im 100% right here but if preconds/postconds can be moved fully to compile time, we could try to get rid of many checks from libc code and for example move them to asserts in debug only etc.

You may see by now already im enthusiast of solution of sw devel moving in bit rusty direction like linux kernel plans to.

Another side question if we got or could use having solution(s) that big redmond supplier has for catching many

pointer access/mem errors/leaks/usebeforealloc/useafterfree/global and stack vars usedbefore init

kind of errors with 100% rate assumed all paths are taken. It is runtime checks only though and might need extending interfaces of malloc free new delete either via syntax extension or without if assisted by compiler.

Best regards,
Pawel Kunio

pon., 26.04.2021, 18:30 użytkownik David Blaikie <dblaikie@gmail.com> napisał:

Hello,
Yep I meant sal. Id think we coud have printfy sallike annottation added to llvm libc at least for emulation of planned c++ extension about precond postcond constraints that need to hold for calls to lib not to give warnings or not to err. We could try to catch common risks if not errors/holes early via such a way.

Possibly - as you say, it's something Microsoft's found useful to
have, though hasn't really been picked up in the unix-y world.

Not sure im 100% right here but if preconds/postconds can be moved fully to compile time, we could try to get rid of many checks from libc code and for example move them to asserts in debug only etc.

Maybe - though being able to build existing code that might rely on
these checks is pretty important, so that might not be a feasible
direction.

You may see by now already im enthusiast of solution of sw devel moving in bit rusty direction like linux kernel plans to.

Another side question if we got or could use having solution(s) that big redmond supplier has for catching many

pointer access/mem errors/leaks/usebeforealloc/useafterfree/global and stack vars usedbefore init

kind of errors with 100% rate assumed all paths are taken. It is runtime checks only though and might need extending interfaces of malloc free new delete either via syntax extension or without if assisted by compiler.

You're asking if the LLVM project has tools for finding those kinds of
bugs? The sanitizers catch some of these at runtime (ASan for
accessing memory that you don't have access to (buffer overrun, etc),
MSan for accessing uninitialized memory, LSan for leaks, etc) and some
can be caught at compile time (Clang has a check for variables used
before they are initialized, for instance).

- Dave

Hi,
As on adopting their ms extensions here, id guess as much as i hate them, theyre in the right here and nux world isat fault.

As on eliminating deep asserts i dont know.

As on run time checkecks i feel aware of them and if you can mix and match them and they work 100% no problem here.

Best regards,
Pawel Kunio

wt., 27.04.2021, 02:01 użytkownik David Blaikie <dblaikie@gmail.com> napisał:

Hi,
As on adopting their ms extensions here, id guess as much as i hate them, theyre in the right here and nux world isat fault.

That's probably a fairly uphill battle to get something like SAL
adopted outside of MSVC. Maybe the folks working on Windows
compatibility in Clang might have some interest/thoughts on the
relative merits, though.

We are using SAL annotations in the FreeBSD system call master definitions. They’re currently parsed by some evil sed scripts. I’d love to have them properly supported by compiler tooling so that we could replace this with a libclang-based tool.

The implementation in the Visual Studio compiler involves a header that is very tightly coupled to some compiler internals and so it’s not very easy to make portable. A clang version would probably have to reimplement the header as well.

David