unusual use of getenv

Hi folks

I noticed something interesting when debugging a program that uses llvm
for JIT compilation.

Running `ltrace` surfaced a number of `getenv("bar")` calls coming from
llvm. It turns out these are not "real" `getenv` calls, but are an
optimization "do nothing" escape hatch which have been in
`llvm/include/llvm/LinkAllPasses.h` [for over 15years](1) - apparently
as a way to prevent the compiler eliminating symbol references to
optimization pass initialization functions. I took a look at the code
and couldn't really work out what issue is being solved as the commit
messages from 2005 have something to be desired :wink:

I removed the whole function body from my local tree and `ninja check`
was happy in release mode (amd64-linux-gcc-10.2). Given its age, and the
fact that it's been through several iterations, I guess I've stumbled
upon a Chesterton Fence and would appreciate some input on whether this
is still needed. I see the original commit was Windows only, and was
then updated to use `getenv` as a way to support this behaviour
cross-platform.

It's more weird than pernicious given that nothing is done with the
result, but to me it feels dirty and confusing to query the process
environment in this way. As such, I wonder 3 things:

    1. Is this still needed? I don't know enough about the original
       toolchains affected to know if the problem still exists, but my
       limited testing shows that it doesn't seem to affect Linux
       builds.
    2. If 1: Is there a better way e.g. define our own function that
       can't be eliminated instead of `getenv` or use features of newer
       language standards and toolchains introduced since 2005 that might
       make the original problem go away on its own (I don't know what
       these might be).
    3. If 1 and not 2: could we make it more obvious that this comes
       from LLVM for those in my situation e.g.
       `getenv("LLVM_IGNORE_THIS_GETENV")` or similar instead of the
       unhelpful "bar" variable?

If it's no longer needed in any case, I can post a removal patch.

Any input is appreciated.

All the Best

Luke

[1] https://github.com/llvm/llvm-project/commit/00d5508496c1e#diff-7206f3725623127339dd17671577a6888ee3402d2e667ae9dd1457ea3600f4e7R3

Reid/Hans - you two happen to have any ideas what this device might’ve been introduced to address on Windows 15 years ago? (possible that it was a real issue back then, or even a misunderstanding that might’ve been common/you might be aware of?)

I am pretty sure this has nothing to do with Windows, but with static linking.

When building an executable (opt, clang) we need to ensure that all
the symbols are available in the artifact to ensure that a loaded
plugin uses them. Otherwise the linker may discard object files from
.a libraries that are not used by the executable itself, which only
uses a subset of the functionality. In particular, one wants to ensure
that all passes are available in the opt executable, even though no
default pass pipeline does not reference a pass but can be added using
the cl::opt mechanism.

Michael

Last time I removed the getenv calls, as an Aprils fool joke btw [1],
I ended up with a compiler that was unable to do a stage 2 build, IIRC.

It might depend on the kind of build you do, e.g., static vs dynamic libraries.

~ Johannes

[1] https://lists.llvm.org/pipermail/llvm-dev/2019-April/131466.html

Oh, right, this stuff. I guess the non-windows solution might’ve been a volatile read, for instance? So maybe not so much that the general machinery isn’t needed, but that perhaps MSVC does something interesting with a volatile read or whatever other solution might’ve been used.

Hmm, not sure why the whole file was added only when MSVC support was added - if it is a “static library object file selection” issue. Wouldn’t that have turned up on other platforms before that moment?

FWIW, Raymond Chen recommended a slight variant of this technique as of 2015, using SetLastError: https://devblogs.microsoft.com/oldnewthing/20150102-00/?p=43233

—Owen

Oh, right, this stuff. I guess the non-windows solution might've been a
volatile read, for instance? So maybe not so much that the general
machinery isn't needed, but that perhaps MSVC does something interesting
with a volatile read or whatever other solution might've been used.

I'm thinking in ELF terms, but perhaps something like this could help:

static volatile int doNotRemoveFlag;
int void __attribute__((weak)) doNotRemove() {
    return doNotRemoveFlag;
}

...

if (!doNotRemove()) {
    return;
}

AFAIU, the linker is not allowed to remove the call to `doNotRemove`
because it might be interposed at runtime. This probably isn't
guaranteed for statically linked binaries however (I investigated this
at one point but have since forgotten the rules for static libraries and
weak symbols). Additionally, for GNU style linkers we could also provide
a list of `KEEP()` symbols, which would mean this would survive LTO of
static binaries too. I don't know if this will be sufficient or doable
on other formats like PE-COFF or MachO

Hmm, not sure why the whole file was added only when MSVC support was
added
- if it is a "static library object file selection" issue. Wouldn't that
have turned up on other platforms before that moment?

Indeed. I'm still curious about this.

Wonderful! When I asked about this internally, a colleague said they were going
to create a platform where `getenv` can return `-1`.

Replacing the getenv with this ought to be sufficient:
volatile int opaque_flag = 0;
if (!opaque_flag) return;

Right. Like a sort of static `-rdynamic` hack.
I'm starting to suspect it's not going to be straightforward to remove
this cleanly.

Thanks for the insight.

This seems like a good approach.

-Chris

It did turn up on non-Windows platforms in
https://reviews.llvm.org/D61446. The name of the file where this trick
is used "LinkAllPasses.h" should give a hint what it is used for.
Looking into the entire history of the file, I have no idea how one
would have concluded this whas a Windows thing. It first occurred in
1c5b428ff8234cef705bf57bc1418deb4db25c83 (SVN r23921) when Windows
support was not yet a thing.

LLVM tools themselves, like opt, call some function for each LLVM
component directly (such as initializeCore()) to achieve the same
thing. Howwer clang does include LinkAllPasses.h (in cc1_main.cpp).
All that matters is that there is a strong reference to at least one
symbol to each object file, otherwise the linker will not include it
when.

Michael

Hi Michael

> Hmm, not sure why the whole file was added only when MSVC support was added - if it is a "static library object file selection" issue. Wouldn't that have turned up on other platforms before that moment?

It did turn up on non-Windows platforms in
https://reviews.llvm.org/D61446. The name of the file where this trick
is used "LinkAllPasses.h" should give a hint what it is used for.
Looking into the entire history of the file, I have no idea how one
would have concluded this whas a Windows thing. It first occurred in
1c5b428ff8234cef705bf57bc1418deb4db25c83 (SVN r23921) when Windows
support was not yet a thing.

I concluded this was a windows thing because the origin of the code
seems to be in 00d5508496c1ec0540da5714b0ed66e64b623df5 which uses the
windows only `GetCurrentProcess` instead of `getenv`. Chris Lattner then
changed it to use `getenv` in 63e504ff437d88f0d4bdb0cd50b045655dea6ab1
(svn r23915).
00d550849 is svn r19307 and predates 1c5b428ff823 by ~9 months. The
"new header" patch you identified as the origin of this code seems to be a copy
of the original. Reid Spencer then consolidated the two in the subsequent patch
a3223665010b9a.

That's as far as my archaeology went on this. It's possible that
00d550849 includes code copied from elsewhere, but to me it looks like
00d550849 is the canonical origin of this code.

All the Best

Luke

Thanks for the heads-up. I didn't know there ever was a Visual Studio
project file in the repository.

Looking at opt at that time
(https://github.com/llvm/llvm-project/blob/00d5508496c1ec0540da5714b0ed66e64b623df5/llvm/tools/opt/opt.cpp),
there were no calls to initializeCore etc. That is, according to
static linking library rules
(http://www.lurklurk.org/linkers/linkers.html#staticlibs), opt would
not pass that are not added by a default pass manager level.

That is, unless the build system adds a flag
(https://github.com/llvm/llvm-project/blob/00d5508496c1ec0540da5714b0ed66e64b623df5/llvm/Makefile.rules#L302)
that forces all object files to be included. I don't know whether
-export-dynamic does that.

This potentially could also have been solved with the /WHOLEARCHIVE
command line flag for the Microsoft linker.

Michael