New warning for mismatched include case

The attached (incomplete proof-of-concept) patch provides a new warning that fires on case-insensitive file systems when the case of the file name in the #include directive doesn't match the case of the file name on disk. The idea is to avoid broken commits that fail to build on case-sensitive file systems. For example:

jason$ ls
test.cpp test.h
jason$ cat test.cpp
#include "Test.h"
jason$ clang++ -fsyntax-only test.cpp
test.cpp:1:10: warning: include case does not match the case of the file on disk
#include "Test.h"
         ^
1 warning generated.

What do you think? Is this desirable? Does the direction look right?

Jason

include_case.diff (2.13 KB)

There are times when I would REALLY love to have something like this.
People in my group never take this into consideration and so I have to go
back through our entire source code and modify things to get them to build
on Linux.

slide

Sounds like a good idea to me. This should also probably have a fixit.

There is also the unusual case of two headers differing by case and location, and expecting the case to choose them rather than the search path order:

/include/foo.h
/usr/include/Foo.h

#include <Foo.h>
#include <foo.h>
// ^ these will find the same header

I think the fixit is still acceptable here, especially since there is no obvious way to solve this on a case-insensitive filesystem.

Jordan

I think this is a great idea (and could definitely use a FixIt).

~Aaron

Hi Jason,

Any idea what is the performance impact of this change? My concern that this is potentially expensive, and we shouldn't be performing the check on a case-sensitve file system (if we can help it). That said, many people have requested this kind of checking, so I'm happy to see you taking a stab at it.

Ted

Yes, performance is definitely an issue. I see about 30% degradation on Cocoa.h with this naive implementation. That's probably not going to fly under any circumstance. I'll take a closer look...

Jason

There would need to be a whitelist mechanism for some headers, I think. I'm pretty sure the WinSDK headers have changed casing over time, so I would want to suppress this warning for windows.h. I would not want to have to change code just because I installed a new SDK.

Sebastian

Okay, I added caching which improved performance some. I also changed it to only check in user code, which made the Cocoa.h test case moot. I've run the attached patch over real code from LLVM and I don't see any measurable performance difference. It still may be worth disabling the check on case-sensitive file systems, although I'm not entirely sure how to do that beyond "#if defined(__APPLE__) || defined(_WIN32)".

If this looks like the right direction then I'll try to address some of the other concerns.

Jason

include_case.diff (3.86 KB)

This updated patch adds a fixit and a test. The test fails on case-sensitive systems. Is it possible to test for an error on some platforms and a warning on others?

Jason

include_case.diff (5.28 KB)

This updated patch adds a fixit and a test. The test fails on case-sensitive systems. Is it possible to test for an error on some platforms and a warning on others?

Jason

There is a preprocessor directive to test for the presence of include files: __has_include I think.

#if __has_include()

include

#endif

should make it so that the #include is only processed on file systems where was found (which excludes case-sensitives file systems).

– Matthieu

Thanks Matthieu! Revised patch attached.

Jason

include-case.diff (5.32 KB)

Hi Sebastian,

I apologize for the delay. I think that it should be fairly trivial to detect which headers come from the SDK by looking in the registry for the path to the SDK. Maybe clang already supports this somewhere? I'm not very Windows savvy myself but I'm willing to take a crack at this if nobody else steps forward with more specific information.

Jason

Please make sure that Michael Spencer (CC'd) has reviewed this before
it goes in. He has tried to implement this before and it turned out to
be nontrivial to do correctly.

--Sean Silva

Very cool, it would be really great to get this warning. However, there is high risks of slowing down compile times here, have you done any analysis of the compile-time cost of this?

-Chris

I have done some performance testing. I see less than 1% slowdown over parts of LLVM. For example, I build the LLVMSupport library repeatedly until the build time becomes more-or-less consistent. Then do the same thing with the warning enabled. I don't know if there's a better way to test this.

Jason

This is with the warning enabled, right? I haven't looked closely at the implementation yet, but you're doing entire directory scans to resolve the original case. I'm not aware of a better way to find the original case, but this is very delicate. It is not uncommon to include hundreds of headers in dozens of different directories.

-Chris

Also make sure you are testing the -fsyntax-only speed. This matters
quite a bit for IDE integration.

I attempted to implement this feature a while ago but ran into the
problem of finding a sane way to get the original case including
parent directories. Most of the time you have to recursively iterate
directories, which you almost do. On Windows you can actually do
GetLongName(GetShortName(path)). This is a lot faster and will work
most of the time, but it doesn't work everywhere.

For my case I actually needed it specifically on Linux as I keep my
source code on Windows and compile on both platforms. NTFS mounted on
Linux is still case insensitive. I've actually broken the build a few
times because of this :(.

- Michael Spencer

This updated patch adds a fixit and a test. The test fails on case-sensitive systems. Is it possible to test for an error on some platforms and a warning on others?

Very cool, it would be really great to get this warning. However, there is high risks of slowing down compile times here, have you done any analysis of the compile-time cost of this?

I have done some performance testing. I see less than 1% slowdown over parts of LLVM. For example, I build the LLVMSupport library repeatedly until the build time becomes more-or-less consistent. Then do the same thing with the warning enabled. I don't know if there's a better way to test this.

This is with the warning enabled, right?

Yes, I'm testing both with the warning enabled and without and comparing the difference. Maybe I should create a sort of worst-case-scenario test case instead?

I haven't looked closely at the implementation yet, but you're doing entire directory scans to resolve the original case. I'm not aware of a better way to find the original case, but this is very delicate. It is not uncommon to include hundreds of headers in dozens of different directories.

Correct, the implementation uses the directory iterator from PathV2 which uses readdir on unix. The real case name is cached, which helps some, but I suspect that it may be better to cache the actual directory listing into a sorted list instead. Then the directory is read just once and subsequent lookups are really fast. I can try this.

Jason

Also make sure you are testing the -fsyntax-only speed. This matters
quite a bit for IDE integration.

This is how I've tested individual test cases but not larger scale real world samples. I'll try to make a worst-case test case that isolates the performance issues related to this warning.

I attempted to implement this feature a while ago but ran into the
problem of finding a sane way to get the original case including
parent directories. Most of the time you have to recursively iterate
directories, which you almost do. On Windows you can actually do
GetLongName(GetShortName(path)). This is a lot faster and will work
most of the time, but it doesn't work everywhere.

My current patch doesn't actually warn for incorrect case of directory names, just the file name. This would probably have to be added before the patch could be accepted.

For my case I actually needed it specifically on Linux as I keep my
source code on Windows and compile on both platforms. NTFS mounted on
Linux is still case insensitive. I've actually broken the build a few
times because of this :(.

Okay, this is interesting to know. It wouldn't just benefit Mac and Windows users.

Jason

FWIW, it's possible to make NTFS case sensitive when you install SFU.