Should hard links be distinct for #pragma once purposes?

We have a remote build system that uses content-addressed storage, in which all files with the same content are hard linked. I noticed that our output object files differed when they were compiled locally vs. remotely, and it turns out that #pragma once considers hard links to be identical, regardless of their path. Our local builds don’t have the hard linking but our remote builds do, so the include tree is different between local and remote builds (because of the #pragma once behavior difference), which causes debug info to differ.

Is this behavior desirable? You can construct a convoluted example that works when you copy a file and fails when you hard link it:

$ cat evil.h

#pragma once

#ifdef EVIL_ONE

#define EVIL_TWO

#endif

#define EVIL_ONE

$ cp evil.h evil2.h

$ cat evil.c

#include “evil.h”

#include “evil2.h”

#ifndef EVIL_TWO

#error Not evil enough

#endif

$ clang -c evil.c

compiles successfully

$ rm evil2.h

$ ln evil.h evil2.h

$ clang -c evil.c

evil.c:4:2: error: Not evil enough

#error Not evil enough

^

1 error generated.

gcc behaves the same as clang in this case, for what it’s worth. https://bugs.llvm.org/show_bug.cgi?id=26579 looks related, although my question is specifically about #pragma once’s intended behavior.

This is long-standing intended behavior and I doubt it could reasonably be changed now. The scenario where you actually want to include the same #pragma once file-content twice, under two different filenames, is nearly nonexistent IME.

I think the critical problem this behavior solves is that directory softlinks in the include paths can easily result in multiple paths pointing to the same file. These must be resolved as equivalent, or you’ll get spurious multiple-include errors, in real-world scenarios. It may also have been workable to just resolve all symlinks in the path to get a canonical path first, and compare those canonical paths to determine file-equivalence, instead of using device/inode of the kernel-resolved file. But I doubt if it’s reasonable to change that now – I expect it will break things for some people, and there doesn’t seem to be enough of a corresponding upside to compensate for that breakage.

That said, I have seen problems arise from CAS header differences – but from a different area: #include_next. If you have two headers named “foo.h” in different directories that say:
#include_next “foo.h”

they have the same content, but they are not at all equivalent-behaving. In order to determine what “next” means, clang needs to know which include-path the header was found from. And I forget the exact details of what goes wrong, but IIRC, clang doesn’t do this in the straightforward obvious way of simply remembering where in the search it found the current header. Instead, it reconstructs it from data cached off to the side, which gets confused by content-addressed storage, because each header can only be cached as being in a single parent directory, or something like that. Anyhow, the one time this problem came up, I didn’t end up figuring out how to do a proper fix, and just worked around the problem by adding a comment to one of the instances of the files, making it no longer have the same hash as the other file.

We have a remote build system that uses content-addressed storage, in which all files with the same content are hard linked. I noticed that our output object files differed when they were compiled locally vs. remotely, and it turns out that #pragma once considers hard links to be identical, regardless of their path. Our local builds don’t have the hard linking but our remote builds do, so the include tree is different between local and remote builds (because of the #pragma once behavior difference), which causes debug info to differ.

I have heard of such things in theory, but this is the most explicitly real-world-ish example I’ve seen yet. :slight_smile:

Is this behavior desirable? You can construct a convoluted example that works when you copy a file and fails when you hard link it:

Yes, that makes sense to me. “#pragma once” means roughly “Process this file just once, even if you see this file being included multiple times.” Obviously if you take this file and copy it, then you have two files: namely, this file, and the copy you just made of it. Whereas, if you don’t copy the file, then there’s only one file: this file.
So Clang’s behavior here seems teachable.

gcc behaves the same as clang in this case, for what it’s worth.

That’s a very good pragmatic reason to keep Clang’s behavior the same. It is kind of surprising that your “evil.h” example changes behavior depending on whether you hardlink-or-symlink the one header file, or copy it so that there are two header files. However, it would be vastly, vastly worse if your “evil.h” example changed behavior depending merely on which compiler you used to compile it!
So Clang’s behavior here seems good for the ecosystem.

my $.02,
Arthur

We have a remote build system that uses content-addressed storage, in which all files with the same content are hard linked. I noticed that our output object files differed when they were compiled locally vs. remotely, and it turns out that #pragma once considers hard links to be identical, regardless of their path. Our local builds don’t have the hard linking but our remote builds do, so the include tree is different between local and remote builds (because of the #pragma once behavior difference), which causes debug info to differ.

the behaviour you describe is the design intent of #pragma once (and #import and friends). 'The same file' is a defined by the content of the file, not by the name you use to refer to it. Yes, this surprised me when I first met it.

feel free to reach out at work, if you need more help.

nathan