Relative Paths in Compilation Database

I am calling runClangTidy using a JSONCompilationDatabase that I generate from a string. Everything works perfectly until I have a relative path.

With the following database I use “D:\CMakeTest\bld\…\src\main.cpp” as the file to open:
[
{
“directory”: “D:\CMakeTest\bld\”,
“command” : “D:/llvm/build/Debug/bin/clang.exe -I"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include” -I"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\atlmfc\include" -I"C:\Program Files (x86)\Windows Kits\8.1\Include\um" -I"C:\Program Files (x86)\Windows Kits\8.1\Include\shared" -I"C:\Program Files (x86)\Windows Kits\8.1\Include\winrt" -I"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include" -I"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\atlmfc\include" -I"C:\Program Files (x86)\Windows Kits\8.1\Include\um" -I"C:\Program Files (x86)\Windows Kits\8.1\Include\shared" -I"C:\Program Files (x86)\Windows Kits\8.1\Include\winrt" -DWIN32 -D_WINDOWS -D_DEBUG -DCMAKE_INTDIR=“Debug” …\src\main.cpp",
“file” : “…\src\main.cpp”
},
]

This gives me an output of “Error while processing D:\CMakeTest\bld\…\src\main.cpp.”

When I use the following database with “D:\CMakeTest\src\main.cpp” as the file to open, everything works.

[
{
“directory”: “D:\CMakeTest\src\”,
“command” : “D:/llvm/build/Debug/bin/clang.exe -I"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include” -I"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\atlmfc\include" -I"C:\Program Files (x86)\Windows Kits\8.1\Include\um" -I"C:\Program Files (x86)\Windows Kits\8.1\Include\shared" -I"C:\Program Files (x86)\Windows Kits\8.1\Include\winrt" -I"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include" -I"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\atlmfc\include" -I"C:\Program Files (x86)\Windows Kits\8.1\Include\um" -I"C:\Program Files (x86)\Windows Kits\8.1\Include\shared" -I"C:\Program Files (x86)\Windows Kits\8.1\Include\winrt" -DWIN32 -D_WINDOWS -D_DEBUG -DCMAKE_INTDIR=“Debug” main.cpp",
“file” : “main.cpp”
},
]

Is there something that I am doing wrong? The only changes between the two compilation databases are the directory and file entries. Both should refer to the same absolute path.

I think this is a known problem. Patches very welcome :slight_smile:

What appears to be the exact problem? Are relative paths just not properly implemented, or is the problem more specific than that?

I haven’t had time to look.

I finally had the opportunity to look into this.

CommandLineArgumentParser in JSONCompilationDatabase.cpp deals with escapes by discarding all backslash characters. I imagine that this works pretty well on Linux, not perfectly, but well enough.

Windows uses backslash as a separator in file paths, so this breaks most file paths on Windows.

How would one go about fixing this so it works on both Windows and Linux? Obviously the different escape sequences need to be implemented. What I really mean is, how do you make it so that it uses the correct set of escapes depending on platform?

In Windows , forward slashes in source files are usable :

include '/a/b/....' ( relative paths can be used such as ../../../ etc. )

( I am using this form , at least since ten years because I am compiling
the same program sources in Windows and Unix without any change , in
Fortran , Pascal , and C )

but , in Console , only back slashes are accepted :

dir \a\b\... /S
ren \a\b\d\g.c h.c

because , forward slashes are used for command line options .

I did not try forward slashes in quoted form : You may try it ( I do not
have any Windows at present ) :

dir "/a/b/..." /S

Thank you very much .

Mehmet Erol Sanliturk

You are correct. Forward slashes can be substituted for backslashes in many cases, but many compilation databases will be generated with backslashes. Either we mandate that only forward slashes can be used, or CommandLineArgumentParser must be fixed.

I am willing and able to make the fixes, but I don’t know how to test for the platform so that the correct set of escapes are used.

You are correct. Forward slashes can be substituted for backslashes in
many cases, but many compilation databases will be generated with
backslashes. Either we mandate that only forward slashes can be used, or
CommandLineArgumentParser must be fixed.

I am willing and able to make the fixes, but I don't know how to test for
the platform so that the correct set of escapes are used.

In Unix world , back slash is used for escape character .
In Windows , in file names , back and forward slashes are usable .
Their intersection is forward slash .

My opinion is that enforcing forward slashes in file names is more suitable
.
Otherwise , always will be necessary to use escape back slash for Windows
file names which is not a convenient form ( In Windows always will be
necessary to clear additional back slashes before submitting it to
operating system routines ) .

Fixing CommandLineArgumentParser also may be a very convenient action : For
file names , forward and backward slashes may be treated equivalent without
using escape back slash .
For back slash used file names , when they are used in Unix environments ,
they may be directly converted to forward slashes .

Lazarus and Free Pascal is using this form since , I think , their starting
time . When this form is used ( forward or back slashes ) , conversion of
them in Unix by the compiler is the most convenient way ( in Windows there
is no any need to such a conversion in sources ) .

I did not use Visual Studio , but , I am compiling sources in Unix (
FreeBSD , Linux ) compilable in Visual Studio without any trouble for
forward slashes ( means Visual Studio is able to accept these , but please
test this for exact decision ) .

Mehmet Erol Sanliturk

Unfortunately this approach can be a significant burden on Windows, since it isn’t always obvious from programmatic inspection what is and isn’t a file path.

Unfortunately this approach can be a significant burden on Windows, since
it isn't always obvious from programmatic inspection what is and isn't a
file path.

Therefore , enforcing forward slashes in file names in every environment is
more suitable .

Mehmet Erol Sanliturk

How do you do that? Generating the compilation database could be extremely difficult, since you often will have mixed forward and back slashes in the system used to generate it.

How do you do that? Generating the compilation database could be
extremely difficult, since you often will have mixed forward and back
slashes in the system used to generate it.

Theoretically , it may seem difficult , but in reality , it is not ,
because each program is compiled in an operating system , and in compilers
some variables are defined in compilers or may be defined by command line
parameters such as name of the operating system ( Unix , Linux , Windows ,
etc . ) ,
and bit size of operating system ( 32 , 64 , etc. ) .

If these are not defined which I am seeing in some situations , these are
the missing parts which only causes inconvenience they need to be fixed .

Assume that the above parameters are properly defined .

By using ifdef statements , it is possible to use properly suitable program
segments .

For the file names , whether a name is related to a file name or not is
apparent from its context .

As I said previously , in Unix always forward slashes are used , therefore
there is no problem in Unix .

In Windows , forward slashes can be used . Assume that backward slashes are
used . Again , this is not a problem , because a routine may check
characters of a file name and make conversions if necessary . For the file
names , it is not necessary to use escape characters for the back slashes .

The different situation is the Console mode applications in Windows .
When file names are used within " " marks , forward slashes CAN be used :

dir "../../*.c" /S

is a VALID console mode statement : If quotation marks are NOT used , this
statement can only be written as

dir ..\..\*.c /S

Since quotation marks can enclose file names in Windows ( this is
compulsory for file names containing blank characters ) , there is no any
problem to enforce forward slashes everywhere .

Actually , the problem is caused by the programming techniques used .
For example , I am seeing "configure" scripts in Linux : They are selecting
32 bits include directories
without checking bit size of the operating system .

It is very simple : In a shell script , only needed statement is

uname -s

is giving Linux ,

uname -m

is giving bit size : x86_64 ,

uname -a

is giving everything .

I could not understand what is difficulty . If you explain it more
extensively , perhaps it may be possible to suggest a more useful solution .

Mehmet Erol Sanliturk

Why not escape the backslashes when writing the compilation db?

So, escape the backslashes once for JSON and a second time for Clang?

If this is going to be necessary on Windows, it probably ought to be added to the documentation for the compilation database file format.

So, escape the backslashes once for JSON and a second time for Clang?

Well, a second time for “shell”.
A different idea that has come up was to just add (optional) ‘arg’ fields to the json, instead of the shell escaping, which seems much more pleasant to work with across platforms, like this:
{
directory: ‘/my/path’
command: ‘just the exec’
argument: ‘-c’
argument: ‘filename’

}
Patches for that would be welcome :slight_smile:

I like that a lot better. I will look at what that is going to take.

If it goes that direction, will it be necessary to also support the older format? I know there are tools, like CMake, that can produce this file format.

Yes, I think it’s easy enough to make compatible

Also, I think we want to introduce a new keyword for the binary.
So both of these should work:

{
directory: “/my/path”
binary: “path to binary”
argument: “-c”
argument: “filename”

}
or the shell-escaped command inside the json escaped string:

{
directory: “/my/path”
command: “path\ to\ binary -c filename”

}
The thing to note is that both shell escaping and shell unescaping are sufficiently hard on non-unix platforms, and even on unix platforms it’s just not nice to go from one to the other.

Cheers,
/Manuel

hi there,

i’m also writing a tool (Bear) which generate the compilation database. and had a few bugs in my code about escaping the command. if it’s not too late i would also propose a format for the problem. :wink:

{

directory: “/my/path”
cmd: [“path to binary”, “-c”, “filename”],

}

which would reformat the command field as a JSON array. would still give the readability of the command and keep the order of the arguments. (i’m not sure that the recommended multiple “argument” key would be correct JSON.)

how does it sound for you?

regards,
Laszlo

Hey all,

This sounds like a good development, and I'd like to throw in my 2 cents.

I looked at a related problem before, and I think it boils down to the
same challenge -- Manuel and I discussed it on the bug here:
http://llvm.org/bugs/show_bug.cgi?id=19687

I ran out of steam before I got around to any real implementation, but
I think the primary challenge is that CMake and Ninja (or other
compdb-producers) don't necessarily have the arguments in list form
either. They usually seem to just have a command-line.

So the complexity of splitting the command-line into arguments can
either go in all the world's generators, or in the single consumer in
LibTooling.

- Kim