Optimization of string.h calls

Hello all. Is there a way to get llvm/clang at build-time to optimize a string.h call so that the final form of the string is saved in the binary? For instance, for a statement like...

strrchr(__FILE__, '/') + 1

…and where clang is called on this code’s source file with “clang /some/long/path/file.c”, can I end up with only “file.c” stored in the binary on disk? I think you can see that I am attempting to avoid full paths from my machine ending up in the program. My IDE, Xcode, always passes full paths to clang when building. Thanks.

I imagine that would be difficult. If you pass the resulting pointer out to unknown code/functions, there’s nothing to stop that code from walking backwards from the pointer and having well defined behavior of observing the prefix you tried to hide.

Essentially the compiler would have to be able to see and analyze all uses of that pointer in one go before it could optimize away the prefix.

Yes, well that is a good point w.r.t. strings as typically passed to string.h functions. I guess in this case I had figured that the use of FILE could easily be optimized since it becomes a string constant. Am I wrong? Is the issue that FILE might occur multiple times in a source file and thus become a merged constant? Perhaps there is simply not much use for a string-optimizing feature since reducing program size isn’t a goal for most developers these days?

Then, more to the point, I wonder if there shouldn’t be a macro in clang/llvm for returning only the current source file name, e.g. FILE_NAME. I don’t know what the feeling is on introducing macros that do not exist in gcc, since it seems that up until now the approach has been quite conservative. Does no one else have a desire to easily suppress the full hard drive paths that will show up in their binaries with the use of FILE? Or perhaps everyone simply has their own existing solutions to this, such as a build system that can pass clang the files by relative path, or placing the source code on its own volume?

Yes, well that is a good point w.r.t. strings as typically passed to
string.h functions. I guess in this case I had figured that the use of
__FILE__ could easily be optimized since it becomes a string constant. Am
I wrong?

I'm not sure I follow.

Imagine the following:

First file:
  void f1(const char* c) {
    puts(c - 4);
  }

Second file:
  void f2() {
    f1("the name" + 4);
  }

The compiler can't optimize away the 'the ' prefix while compiling the
second file because it can't know whether the users of that pointer might
subtract from it, walking into well defined memory they should be able to
examine/print/etc.

Is the issue that __FILE__ might occur multiple times in a source file
and thus become a merged constant? Perhaps there is simply not much use
for a string-optimizing feature since reducing program size isn’t a goal
for most developers these days?

Then, more to the point, I wonder if there shouldn’t be a macro in
clang/llvm for returning only the current source file name, e.g.
__FILE_NAME__. I don’t know what the feeling is on introducing macros that
do not exist in gcc, since it seems that up until now the approach has been
quite conservative. Does no one else have a desire to easily suppress the
full hard drive paths that will show up in their binaries with the use of
__FILE__? Or perhaps everyone simply has their own existing solutions to
this, such as a build system that can pass clang the files by relative
path, or placing the source code on its own volume?

Yeah, I forget/don't recall how we solve this in-house, but I'm sure we
have a way - perhaps someone else will chime in.

Imagine the following:

First file:
  void f1(const char* c) {
    puts(c - 4);
  }

Second file:
  void f2() {
    f1("the name" + 4);
  }

The compiler can't optimize away the 'the ' prefix while compiling the second file because it can't know whether the users of that pointer might subtract from it, walking into well defined memory they should be able to examine/print/etc.

I see what you mean. To me, this falls under the same category as other odd coding practices that will break if certain optimizations are applied to them. The developer should be picking build settings cautiously to make sure their code is treated as desired.

I could picture a string-trimming optimization falling under a “small binary” optimization setting, which I thought at first was “-Os" except that apparently this is just a synonym for -O2. Or simply an optional argument like -fprune-strings could be made available with the understanding that back-tracking from a string pointer is not going to work under this setting.

Anyway, I won’t harp on this subject anymore, but if anyone would like to share their approach for avoiding full machine paths in the binary from the use of __FILE__, I’d appreciate it. I have written a script which copies the project to a RAM disk and compiles it from there, eliminating the paths above the level of the project, but there’s probably a better way.