Testing through api vs. commands

The past few weeks I’ve spent a lot of time xfailing the rest of the failing tests on windows so we can enable tests to run on the bots.

One thing I ran into more frequently than I would have liked is that the tests were failing not because the functionality is broken, but because the substrings being grepped for in the output had a slightly different format on Windows. The pattern for tests is frequently something like:

result = runCommand()
self.expect()

A good example of this is that when you do a backtrace, on windows you might see a fully demangled function name such as a.outvoid foo(int x), whereas on other platforms you might just see a.outfoo.

I saw the reverse situation as well, where a test was passing but shouldn’t have because it was actually broken, but due to the impreciseness of grepping output the grep was suceeding. Specifically, this was happening on a test that verified function parameters. it launched a program with 3 arguments, and then looked for “(int)argc=3” in the frame info. It was broken on Windows because argc was pointing to junk memory, so it was actually printing “(int)argc=3248902” in the output. Test was passing, even though it was broken.

Rather than make the regexes more complicated, I think the right fix here is to stop using the command system and grepping to write tests. Just go through the api for everything, including verifying the result. In the second case, for example, you launch the process, set the breakpoint, wait for it to stop, find the argument named argc, and verify that it’s value is 3.

I don’t want to propose going back and rewriting every single test to do this, but I want to see how people feel about moving toward this model going forward as the default method of writing tests.

I do still think we need some tests that verify commands run, but I think those tests should focus not on doing complicated interactions with the debugger, and instead just verifying that things parse correctly and the command is configured correctly, with the underlying functionality being tested by the api tests.

Thoughts?

I have held from the beginning that the only tests that should be written using HandleCommand are those that explicitly test command behavior, and if it is possible to write a test using the SB API you should always do it that way for the very reasons you cite. Not everybody agreed with me at first, so we ended up with a bunch of tests that do complex things using HandleCommand where they really ought not to. I'm not sure it is worth the time to go rewrite all those tests, but we shouldn't write any new tests that way.

Jim

We’ll probably rewrite tests that we find are failing specifically as a result of issues like this, but I agree it’s not worth re-writing everything else except on an as-needed basis.

To make the distinction explicit and enforce it kind of at an organnizational level, would it be worth creating folders under lldb/test like lldb/test/commands, and recommending that all HandleCommand tests to go there?

Possibly unrelated question: But in regards to this api test vs. HandleCommand test situation, is that what the purpose of the @python_api_test decorator is (and is this decorator even still useful)?

We'll probably rewrite tests that we find are failing specifically as a result of issues like this, but I agree it's not worth re-writing everything else except on an as-needed basis.

To make the distinction explicit and enforce it kind of at an organnizational level, would it be worth creating folders under lldb/test like lldb/test/commands, and recommending that all HandleCommand tests to go there?

Possibly unrelated question: But in regards to this api test vs. HandleCommand test situation, is that what the purpose of the @python_api_test decorator is (and is this decorator even still useful)?

I think it might have had some utility back when the SB API's were new and we were shuffling them around a lot, though I never found a use for it. I don't see any reason for it to exist any more.

Jim

We'll probably rewrite tests that we find are failing specifically as a result of issues like this, but I agree it's not worth re-writing everything else except on an as-needed basis.

To make the distinction explicit and enforce it kind of at an organnizational level, would it be worth creating folders under lldb/test like lldb/test/commands, and recommending that all HandleCommand tests to go there?

Yes, that might help. Note that the test/README-testsuite explicitly states this policy:

o Writing test cases:

We strongly prefer writing test cases using the SB API's rather than the runCmd & expect.
Unless you are actually testing some feature of the command line, please don't write
command based tests. For historical reasons there are plenty of examples of tests in the
test suite that use runCmd where they shouldn't, but don't copy them, copy the plenty that
do use the SB API's instead.
etc...

Anything that will make that policy more explicit is to the good.

Jim

Sounds good. I think perhaps one reason a lot of tests are written using the commands is because not all functionality available through commands is available through the SB API. Adrian is working on creating some tests for core dump debugging on Windows right now, and there’s no way to access the functionality of “target create -c” through the API without using runCmd. I’m sure there’s hundreds of examples like this, so I think in a lot of cases people just don’t want to take the time to make the necessary improvements to the SB API to expose the functionality they need.

In any case, if I know we agree, then I will keep an eye out for new tests that go in like this and push back against them in an effort to get more api tests in.

Sounds good. I think perhaps one reason a lot of tests are written using the commands is because not all functionality available through commands is available through the SB API. Adrian is working on creating some tests for core dump debugging on Windows right now, and there's no way to access the functionality of "target create -c" through the API without using runCmd. I'm sure there's hundreds of examples like this, so I think in a lot of cases people just don't want to take the time to make the necessary improvements to the SB API to expose the functionality they need.

Yes, one of the reasons I try to insist on SB API tests is so that folks will be less inclined to add functionality to the command line that isn't also made available through the SB API's. OTOH, I'm sure I am guilty of this as well...

But I'm pretty sure you can create core file targets with the SB API's. You make an SBTarget (either an empty one if you don't know what the core file contains or with the core file's backing executable if you know it) then call SBTarget::LoadCore. That returns you the SBProcess represented by the core.

In any case, if I know we agree, then I will keep an eye out for new tests that go in like this and push back against them in an effort to get more api tests in.

Cool, thanks.

Jim

> > > I do still think we need some tests that verify commands run, but I think those tests should focus not on doing complicated interactions with the debugger, and instead just verifying that things parse correctly and the command is configured correctly, with the underlying functionality being tested by the api tests.
> > >
> > > Thoughts?

I agree that we should have both testing methods - SB API *and* commands,
because we should be testing the user command interface too!

Instead, to fix the Windows vs. other issue, I would suggest writing a
sub-class that won't expect the missing params based on platform.

In any case, there's a lot I never could figure out how to do in the SB
API that I could only do via commands. For example, how do you test
that a trailing space at the end of the expr --language option's argument
is trimmed in the SB API?

-Dawn

I agree that we should test the command interface, but

a) they should be explicitly marked as interface tests.
b) There should be MUCH fewer.
c) It should only verify that typing a particular command maps to the right core sequence of public / private API calls. Not that the debugger functionality works as expected (since that is already tested through API tests).
d) Results of these interface tests should also not be verified by the use of self.expect, but itself through the API. (Relying on the text to be formatted a specific way is obviously problematic, as opposed to just verifying the actual state of the debugger)

c is probably the hardest, because it’s hard to verify that a command does what you think it does without actually having it do the thing. It’s possible with some refactoring, but not somethign that can be whipped up in a day or two though.

I do still think we need some tests that verify commands run, but I think those tests should focus not on doing complicated interactions with the debugger, and instead just verifying that things parse correctly and the command is configured correctly, with the underlying functionality being tested by the api tests.

Thoughts?

I agree that we should have both testing methods - SB API *and* commands,
because we should be testing the user command interface too!

Instead, to fix the Windows vs. other issue, I would suggest writing a
sub-class that won't expect the missing params based on platform.

In any case, there's a lot I never could figure out how to do in the SB
API that I could only do via commands. For example, how do you test
that a trailing space at the end of the expr --language option's argument
is trimmed in the SB API?

I'm not quite sure I understand what you mean by this example? It sounds like you are asking how to test peculiarities of the
Command Line language name option parser through the SB API. Not sure that makes sense.

But if there's anything you can do with the lldb command line that you can't do with the SB API's that is a bug. Please file bugs (or maybe ask "how would I do that" here first then file a bug if you get no answer.)

Note, you might have to do some work to marshall information in a way that looks like what some of the complex commands produce. But even some of the complex printers like the frame & thread Status are available through the SB API's and anything else useful like that should also be exposed at some point.

If we had much more time we might have built the Command Line on top of the SB API's, but that would have made it really hard to bootstrap the thing. In theory, somebody could go back & re-implement the lldb command line on top of the SB API's - much like what was done with the MI commands. In actuality that would be a lot of effort that could be better spent somewhere else. But not doing it that way means we have to be careful not to add stuff to the command line without adding some way to get at it with the API's.

Jim

a) they should be explicitly marked as interface tests.

What's the decorator for this?

d) Results of these interface tests should also not be *verified* by the
use of self.expect, but itself through the API. (Relying on the text to be
formatted a specific way is obviously problematic, as opposed to just
verifying the actual state of the debugger)

How do you rely "on the text to be formatted a specific way"?

a) they should be explicitly marked as interface tests.

What’s the decorator for this?

There’s not one currently.

d) Results of these interface tests should also not be verified by the
use of self.expect, but itself through the API. (Relying on the text to be
formatted a specific way is obviously problematic, as opposed to just
verifying the actual state of the debugger)

How do you rely “on the text to be formatted a specific way”?

Quick example: One of the most common patterns in the test are to do some things, print a backtrace, and run a regex against the output of the backtrace. I’ve found at least 3 different examples of how this causes things to fail on Windows:

  1. When we demangle a function, we print the full signature. like foo.exevoid func(int). On linux/mac, it would print foofunc. So when the regex is built as <module> this will fail on windows, because our demangled names look different. The proper way to do this is to get an SBFrame, get the instruction pointer for it, get an SBSymbol for the instruction pointer, then get the short name of the symbol and compare it against ‘func’ or whatever your function name is.

  2. Another example is that there’s some tests that verify argument passing on the command line to the inferior. Again, it does this by printing a backtrace, and grepping for “argc=3” anywhere in the backtrace. This is an incorrect check for any number of reasons. On Windows it’s incorrect in that it succeeds when it should fail, because argc is actually pointing to junk memory, and it’s value is 3290823098 or something that happens to start with a 3. It could also be incorrect.

  3. A third example is that when we encounter an exception on Windows (access violation, etc) the debugger will break. A backtrace will show stop reason = exception, as it should. But we would like to print the exception code as well, because this is important on Windows. This differs from the situation on other platforms, where you might print the name of a signal, for example. So you have to carefully construct a regex to make sure it doesn’t mess up platforms which you don’t even work on, which is problematic.

This problem will probably become worse as time goes on, because people who are accustomed to debugging on Windows (and Linux, and MacOSX, etc) have certain expectations of how things should look, and to present the best user experience we would like to match those expectations wherever it is reasonable to do so. But since many tests are basically tests of the formatted command output, it makes it almost impossible to veer from the current format without breaking tests.

So ultimately, core functionality testing should be independent of command testing.

> In any case, there's a lot I never could figure out how to do in the SB
> API that I could only do via commands. For example, how do you test
> that a trailing space at the end of the expr --language option's argument
> is trimmed in the SB API?

I'm not quite sure I understand what you mean by this example? It sounds like you are asking how to test peculiarities of the
Command Line language name option parser through the SB API. Not sure that makes sense.

See test test_settings_with_trailing_whitespace in test/settings/TestSettings.py.
A test I added here looks like:
        # language
        self.runCmd ("settings set target.language c89") # Set to known value
        self.runCmd ("settings set target.language pascal ") # Set to new value with trailing whitespace
        self.expect ("settings show target.language", SETTING_MSG("target.language"),
            startstr = "target.language (language) = pascal")
        self.runCmd("settings clear target.language", check=False)

How would you write this test using the Python API?

Note, you might have to do some work to marshall information in a way that looks like what some of the complex commands produce. But even some of the complex printers like the frame & thread Status are available through the SB API's and anything else useful like that should also be exposed at some point.

I don't think anyone is against using the Python API, but it needs to be easier
to learn and use before folks will adopt it over self.expect. self.runCmd and
self.expect are straight forward and it's easy to quickly write tests using it.
If you make it too hard to write tests, you'll deter folks from writing tests for
the features they add.

In theory, somebody could go back & re-implement the lldb command line on top of the SB API's

That would be a cool project to give to some poor college students working on a
branch :slight_smile:

-Dawn

I don’t have an answer about how or if it is possible to do that using the Python API. But remember that the Python API can be extended however you see fit. If there is no mechanism to set or retrieve the value of settings, you can add those methods to the SB API.

> > a) they should be explicitly marked as interface tests.
> What's the decorator for this?

There's not one currently.

Will there be?

> > d) Results of these interface tests should also not be *verified* by the
> > use of self.expect, but itself through the API. (Relying on the text to
> be
> > formatted a specific way is obviously problematic, as opposed to just
> > verifying the actual state of the debugger)
>
> How do you rely "on the text to be formatted a specific way"?

Quick example: One of the most common patterns in the test are to do some
things, print a backtrace, and run a regex against the output of the
backtrace. I've found at least 3 different examples of how this causes
things to fail on Windows:

Oh, I misunderstood your point. You're saying the existing tests rely
"on the text to be formatted a specific way". Right.

I was asking how you *could* make sure the output was formatted a
specific way using the Python API.

a) they should be explicitly marked as interface tests.
What’s the decorator for this?

There’s not one currently.

Will there be?

Whenever someone adds one :slight_smile: If you want to start doing this, it should be fairly easy. Even if the decorator doesn’t do anything useful yet, you can just put it in and start putting it on interface tests if you want to start writing some. I think earlier in the thread Jim agreed that it would be a good idea to separate command and api tests at the file system level as well. So that’s also something to think about if you plan to write some tests like this.

d) Results of these interface tests should also not be verified by the
use of self.expect, but itself through the API. (Relying on the text to
be
formatted a specific way is obviously problematic, as opposed to just
verifying the actual state of the debugger)

How do you rely “on the text to be formatted a specific way”?

Quick example: One of the most common patterns in the test are to do some
things, print a backtrace, and run a regex against the output of the
backtrace. I’ve found at least 3 different examples of how this causes
things to fail on Windows:

Oh, I misunderstood your point. You’re saying the existing tests rely
“on the text to be formatted a specific way”. Right.

I was asking how you could make sure the output was formatted a
specific way using the Python API.

I’m actually not that convinced it’s super useful. Other people might disagree, but I’ll let them chime in. Personally I think it’s much more useful to verify that typing a command with certain options affects the state of the debugger in the expected way. Right now the way we verify this is by looking at the formatted output. But another way (albeit much more work to get going) would be to have the command objects and the SB classes both be backed by the same classes that implement the commands (I admit I’m handwaving a little here)

This way your core functionality tests uses the python api and verifies the state of the debugger using the SB API, and your command tests simply process a command and get back some sort of configuration structure, and all you have to do is verify that the configuration structure is filled out correctly. Since the core functionality tests already verified that the functionality works if passed an appropriate configuration structure, you don’t need the command tests to actually do any work or manipulate the debugger.

Anyway, this is just a high level idea that probably needs refinement. I’m mostly just brainstorming.

I would like to revive this thread, because there doesn’t seem to be consensus that this is the way to go. I’ve suggested on a couple of reviews recently that people put new command api tests under a new top-level folder under tests, and so far the responses I’ve gotten have not indicated that people are willing to do this.

Nobody chimed in on this thread with a disagreement, which indicates to me that we are ok with moving this forward. So I’m reviving this in hopes that we can come to agreement. With that in mind, my goal is:

  1. Begin enforcing this on new CLs that go in. We need to maintain a consistent message and direction for the project, and if this is a “good idea”, then it should be applied and enforced consistently. Command api tests should be the exception, not the norm.

  2. Begin rejecting or reverting changes that go in without tests. I understand there are some situations where tests are difficult. Core dumps and unwinding come to mind. There are probably others. But this is the exception, and not the norm. Almost every change should go in with tests.

  3. If a CL cannot be tested without a command api test due to limitations of the SB API, require new changes to go in with a corresponding SB API change. I know that people just want to get their stuff done, but I dont’ feel is an excuse for having a subpar testing situation. For the record, I’m not singling anyone out. Everyone is guilty, including me. I’m offering to do my part, and I would like to be able to enforce this at the project level. As with #2, there are times when an SB API isn’t appropriate or doesn’t make sense. We can figure that out when we come to it. But I believe a large majority of these command api tests go in the way they do because there is no corresponding SB API yet. And I think the change should not go in without designing the appropriate SB API at the same time.

Zach

IMHO that all sounds reasonable.

FWIW - I wrote some tests for the test system changes I put in (for the pure-python impl of timeout support), and in the process, I discovered a race condition in using a python facility that there really is no way I would have found anywhere near as reasonably without having added the tests. (For those of you who are test-centric, this is not a surprising outcome, but I’m adding this for those who may be inclined to think of it as an afterthought).

-Todd

I didn’t follow those threads, can you summarize what the race condition was? Is it likely to crop up in the normal test suite? i.e. not the test suite that tests the test suite.

I didn’t follow those threads, can you summarize what the race condition was? Is it likely to crop up in the normal test suite? i.e. not the test suite that tests the test suite.

Sure. At least on OS X and Linux, the impl of subprocess.Popen.wait() and subprocess.Popen.poll() are written such that they are not thread safe. And subprocess.Popen.communicate() calls wait() internally. In order to implement a timeout in Python, we essentially need to run the communicate() part on a separate thread, and have another thread timing out on that first one not finishing up within the timeout. Two things can potentially touch a wait() or poll() call. If and when they race, the actual subprocess.Popen.returncode (the return value of the executed program) can get mis-set. i.e. the return value of the called program can come back incorrectly. This is catastrophically wrong in the common case of the race, which is that it gets written 0 when in fact there was an error in the underlying process (e.g. signal/exceptional process exit).

It was bugged here:
https://llvm.org/bugs/show_bug.cgi?id=25019

and fixed here:
r249182

-Todd