Backslashes in command arguments

One thing causing many tests to fail on Windows is the presence of backslashes in argument. Until now, I’ve worked around this in many cases by making sure that arguments with backslashes are always quoted.

But there are some cases where this is not easy to guarantee and now I’m leaning towards (at least on Windows) allowing backslashes in argument strings. The code in question comes from the function void SetCommandString (const char *command) in the file Args.cpp

In particular, it implements special handling of whitespace, single quotes, double quotes, and backslashes. For the case of backslashes it removes them from the string.

What would be the implications of removing backslash handling from this function for all platforms? I would prefer to keep platform specific code out of generic code, but I think this needs to be changed on Windows.

FWIW, Removing backslash handling from this function (essentially ignoring backslashes in command arguments) fixes about 12-15 tests on Windows with no other changes. I see there’s some logic in Args for encoding and decoding escape sequences, but this only happens if a particular command option is set, and that command only only appears to be set for the “prompt” command. I’m not sure if removing the backslash logic from SetCommandString would interfere with this command in any way, but it doesn’t seem like it would interfere with any other commands, unless I’m missing something.

If I have a file called foo"bar'baz, how what would I put in the place of <AWKWARD NAME> for

(lldb) file <AWKWARD NAME>

Right now, I can do:

(lldb) file foo\"bar\'baz
Current executable set to 'foo"bar'baz' (x86_64).

Jim

On Windows, that’s not a valid file name, and in fact any file name that has an escaped character is not a valid filename. I see what you’re getting at though, that for non-Windows it’s necessary. Can I wrap the backslash handling in #ifndef _WIN32 then?

Actually ’ is a valid filename character in Windows, but not ". That being said, it’s so rare, and having a backslash is so common that I would prefer the common case to be easy, and the rare case being either unsupported or difficult is ok with me. So I’d still prefer to disable this backslash handling on Windows for now, and then fix ’ on Windows separately in the future.

It's not just file names, you would also have to make sure there's nothing else that might be in command argument or option value that has a single & a double quote, since without the backslashes this would be impossible.

If you really want to do this, I think it would be better to add a debugger setting for the "protection character". Then this would be set to something else (maybe "#") on Windows, and backslash on all other systems. That way you could probably always find a protection character that you could use for whatever gnarly string you ended up having to pass through the command interpreter.

Jim

There doesn’t seem to be a good way to get access to the Debugger object from the Args class. I tried changing Args to take a Debugger to its constructor, but it seems this isn’t always possible, for example with anything having to do with OptionValue.

I could provide a default value of NULL for the Debugger, and if NULL it would fall back to the default escape character, but this seems awkward.

Since I’ve declared this as a global property, why should I even need a Debugger instance? Shouldn’t the global settings be static on the Debugger so that they can be accessed without an instance?

There doesn't seem to be a good way to get access to the Debugger object from the Args class. I tried changing Args to take a Debugger to its constructor, but it seems this isn't always possible, for example with anything having to do with OptionValue.
I could provide a default value of NULL for the Debugger, and if NULL it would fall back to the default escape character, but this seems awkward.

Since I've declared this as a global property, why should I even need a Debugger instance? Shouldn't the global settings be static on the Debugger so that they can be accessed without an instance?

This is the problem. See how process does it:

class Process {
    static const ProcessPropertiesSP &
    GetGlobalProperties();
}

The debugger should do the same thing. It should also add static accessors for all of the settings that are truly global settings. Looking at the global settings, they all seem to be set to be global values which probably isn't what we want. Why? Xcode creates a new Debugger for each debugging window that it creates and if someone types "setting set auto-confirm false" in one command interpreter, it shouldn't affect the other.

So I would say we need to switch all debugger settings to be non-global (third initialized in each of the PropertyDefinition values should be set to "false" in g_properties in Debugger.cpp) and the only one that should remain global is your new escape character. Then Debugger should get a new static accessor:

class Debugger
{
    static char
    GetEscapeCharacter();
};

This in turn will access the global properties and return it to outside folks.

Check out the ProcessProperties class in Process.h and check out its constructor. We will need to do something similar for the debugger's settings. Right now Debugger inherits directly from Properties, but we will need to change it to be like the process where there is a DebuggerProperties class. The global version gets constructed with a "bool is_global" set to false, and the instance ones get constructed with that set to true.

The Process class is the model we will need to follow if you want to make this change and have a true global property that can be accessed without a Debugger instance.

Thanks for the response. By the way, is this setting more appropriate on the CommandInterpreter instead of the debugger?

Thanks for the response. By the way, is this setting more appropriate on the CommandInterpreter instead of the debugger?

Yes, the command interpreter makes more sense.

Jim

Probably the command interpreter, but we have all the command interpreter settings on the debugger right now anyway, so don't change anything on that end..

Actually we do have command interpreter settings as Jim just pointed out to me, so feel free to add it there and hook the global settings up correctly for that with a static function on CommandInterpreter to get the escape character.

Greg

The way Process does it is a little confusing to me. It has this ProcessProperties() class which takes a boolean, is_global. If this value is true, it appends some sub-properties to it, and if it’s false, it just initializes itself by calling itself again with true. So false and true make no difference, it’s always the same instance (?)

Then, to further confuse matters, there’s a single PropertyDefinition array, where some of them are specified as global and some not via the third argument. When you call GetGlobalProperties(), it then just returns the whole array. There’s not much choice it has, because it’s all just one array, so it can’t really return a filtered array with only the items where global==true.

I feel like the correct way to do this is to have two separate arrays. One for global properties and one for instance properties. Then, instead of inheriting from Properties, just have two instances of it in the class. An instance instance and a static instance. This way you don’t even need the third argument to PropertyDefinition at all. And the global properties are added as a subcategory, such as interpreter.global. This also makes it clear to the user when they run “settings list” which settings are global and which are not. It might be clearer with an example, so I’ll upload this patch so you can see what I’m talking about. If you don’t like the way I’ve done it let me know.

The way Process does it is a little confusing to me. It has this ProcessProperties() class which takes a boolean, is_global. If this value is true, it appends some sub-properties to it, and if it's false, it just initializes itself by calling itself again with true. So false and true make no difference, it's always the same instance (?)

No. is_global is set to true if this is the initial "seed" options for "process" under "target". This is why when "is_global" is set to true that it appends the settings to the "target" global settings, and it doesn't when not.

If you think about it you need a set of options that can be editable before any processes exist. So the "is_global" means we are creating this one set of settings, it has nothing to do with the fact that individual settings can be global or instance based. When a lldb_private::Process is made, it inherits from ProcessProperties, and these are the settings that get modified and any global properties are shared with the global versions, and any instance variables copy the current global settings to be the basis for the initial settings in Process.

So lets say ProcessProperties has two settings "a" which is global and "b" which is not (set by the bool in each property definition). When we first start out we create the global properties which contain default values for "a" and "b":

Global ProcessProperties:
a = true
b = false

Now we run some commands:

(lldb) settings set process.a false

So now we have:

Global ProcessProperties:
a = false
b = false

Now we create a process whose pid is 123:

(lldb) file /bin/ls
(lldb) b malloc
(lldB) r

Now a new process gets created and it makes a copy of "Global ProcessProperties":

Process 123 ProcessProperties:
a = false
b = false

Now someone says:

(lldb) settings set process.a true

Since "a" is global we get:

Global ProcessProperties:
a = true
b = false

Process 123 ProcessProperties:
a = true
b = false

But if we do:

(lldb) settings set process.b true

The Global properties remain unchanged, and only Process 123's settings get changed:

Global ProcessProperties:
a = true
b = false

Process 123 ProcessProperties:
a = true
b = true

Then, to further confuse matters, there's a single PropertyDefinition array, where some of them are specified as global and some not via the third argument. When you call GetGlobalProperties(), it then just returns the whole array. There's not much choice it has, because it's all just one array, so it can't really return a filtered array with only the items where global==true.

I feel like the correct way to do this is to have two separate arrays. One for global properties and one for instance properties. Then, instead of inheriting from Properties, just have two instances of it in the class. An instance instance and a static instance. This way you don't even need the third argument to PropertyDefinition at all. And the global properties are added as a subcategory, such as interpreter.global. This also makes it clear to the user when they run "settings list" which settings are global and which are not. It might be clearer with an example, so I'll upload this patch so you can see what I'm talking about. If you don't like the way I've done it let me know.

Read what I typed above carefully and make sure that:
1 - we need to be able to set non-global properties without any instances so there needs to be global settings that contain all global and all instance based properties so we can set these values before we have any instances around
2 - when a new instance is made, it needs to get a copy of the global pref's global and non-global option values
3 - if you modify a value that is non-global, there needs to be a way to find the right instance so you can set the settings as the code currently does

I would rather you not change this code as it currently works and is relatively bug free. If you do still feel you need to change the code do NOT break anything. This means a lot of testing to make sure it behaves the exact way it used to. Messing with this can really hose things up, so again, don't break anything if you still feel you need to change the code.

Greg

I definitely need to be able to specify paths with backslashes unquoted, but whether this is the best approach I’m still undecided.

To be honest, I’m ok for now just disabling backslash processing with an #ifndef _WIN32. This might cause other issues down the line for us, but they will be fairly uncommon, and it will fix the common case now. And we can do a more proper fix if/when the problem resurfaces.

For example, one idea i had to fix this in a deeper way was to audit all settings and options. For each one that’s a path, make sure the option type is eOptionTypeFileSpec. Then sink the escaping logic into FileSpec. The added benefit of this is that it easily gives you access to the native syntax of a remote platform’s file system by way of the PathSyntax enum i added to FileSpec some time ago. So if you’re running some command that takes a path in the context of a remote host, you can use that hosts escaping rules. Plus this only changes the behavior for paths, which is the only place we need it, so doesn’t break anything else.

This is a bigger change though, and not needed immediately, which is why I would prefer postponing.

Thoughts about this approach?

I don't like this. Having two arguments to the same command with different syntaxes seems pretty ugly to me.

I didn't completely follow why having the protection character be settable ended up being annoying to do, but short of that, I think the best option is to have the protection character be a host setting, leave it at "\" for everything but Windows, and pick some likely character for Windows. Then we'll have to document this somewhere that the average debugger user is likely to find it.

Jim

It wouldn’t be two arguments to the same command having a different syntax, it would be one argument to a command having a different syntax depending on how you were using the command.

If I’m debugging Windows remotely from a Linux machine, and I need to specify a remote executable, it’s more natural to use a Windows syntax to specify it. Of course the default would be the host’s default.

The reason I’m not crazy about the protection character is because any character you use is going to be awkward. Escaping characters with # signs just isn’t very intuitive to anyone on any platform. On the other hand, escaping characters in filenames just isn’t a thing that anyone needs to do on Windows, because characters that require escaping are not valid filename characters. I guess ultimately I’m looking for a way so that each platform can have the most natural way of doing things on that platform, so nobody feels like a second class citizen. Using # characters (or really any character other than ) for escaping is going to be unnatural, but even more importantly unnecessary since we only need this for file paths.

That’s what led me to the aforementioned idea. We already have the ability to mark option values as file paths by using eOptionTypeFileSpec. So why not just put the logic there, and then there’s no setting anywhere, and everyone can use the native path syntax for the platform they’re working with.

I take back my first sentence. It would be two different syntaxes if one argument is a file path and one argument is a non file-path string. Like for example:

(lldb) somecommand testarg c:\foobar

In this case yes, the two arguments would have different rules. But a) they would only have different rules on Windows, and b) I think that’s ok, people whose primary operating system is Windows have the \ so engrained in their heads that this will be the most natural behavior.

I still don't like this, but unless I'm missing something, I don't see how it is possible. Suppose I have:

(lldb) foo bar\ --build

What is that on Windows? Is it a filename argument "bar\" and then the option --build? Or is it the filename "bar\ --build". Breaking the line into tokens has to happen before you know what the tokens mean. That's at odds with your plan to have the protection character be a protection character or not depending on what that word means in the already tokenized line. I don't see how you're going to do that without really complicating command line parsing.

Jim