More tests

Yesterday a change went up for review to add support for mips64 in the unwinder. In the review, I proposed adding some tests. There was some discussion in the thread, but I don’t want to pollute someone’s review thread with a long disucssion, and I thought maybe more people would be interested. So I’m moving it over here.

For some context, I proposed the idea of testing this type of thing via unit tests. I certainly thing that do a full live-system debugging test would be useful for this type of thing, but I think this is a great example of where a unit test would provide a lot of value. For starters, the full scale tests are very heavyweight. There’s a lot of setup involved, they’re slow, they’re platform specific.

I think for tests like this we can write them as UnitTests in a way that they’re faster, lighter weight, more targeted, and platform independent. One of the issues brought up in the commits thread linked above is that the test would only be able to run on mips hardware. I disagree. The test could come with a file containing hand-written assembly. We can use the LLVM assembler to target mips, which will generate mips bytecode no matter which platform you’re on. Then we can just feed the bytes to the unwinder.

Of course, you need a RegisterContext. We would probably have to make some changes to the unwinder so that the RegisterContext could be abstracted out. This way, a test could provide its own mock register context and target memory reader.

You still need a full-scale integration test to confirm that unwinding actually works in context. But I think we should consider unit tests for things like this. It’s difficult to write unit tests right now because the code in many places is not well testable (for example, not being able to abstract the registercontext so that a test could provide its own).

But I want to just toss this out there and see how much pushback I get. If it’s something people are interested in, then it’s something we could work towards gradually. I think we can all agree that checking in code with 0 tests sucks, so hopefully we can work towards something that gets us into a better state. Getting more full-scale public API tests is one angle we can approach it from. But I think unit tests would be some nice additional ammo we could throw at the problem, if we work towards better abstractions and more testable code.


For the unwinder we could just test that certain assembly code produces the expected unwind rows. This would involve exposing the unwinder through the lldb::SB API. This would help test that the correct rows are being produced for exact code.

I think exposing unwinding info through the lldb::SB API is a great idea, though we really need to make sure the right API is exposed.

Core files are really the best way to test if unwinding works as expected. I am guessing it wouldn't be too hard to make small core files that test one specific unwinding scenario. Maybe a python script could created that takes an assembly file and a triple and it produces an ELF or MachO core file with a specified or default register context? As long as the core files are small they could be checked in as tests. Full blown actual core files are way too big and will cause major problems in a GIT repository if they get changed, but small targeted core files could be small enough that they don't require changes.

Thanks for raising this outside the scope of that one patch review.

There are different things we may want to test related to the Unwinder subsystem:

1. Testing of assembly instruction profile generated unwind instructions
   (looking at the assembly instructions, spotting the insns relevant to unwind, emitting unwind instructions for them.)

2. Testing of eh_frame parsing & the generated unwind instructions

3. Testing of compact_unwind {Darwin specific format} parsing & the generated unwind instructions. This one is arch specific so testing i386/x86_64/arm64 are all valuable.

4. Unwind behavior correctness. Do we pick the correct unwind plan (eh_frame? compact unwind? assembly language profiling? Augmented eh_frame? The ABI arch default unwind plan? Do we backtrace correctly if the program jumps to 0? The answers may differ depending on whether we're looking at frame 0 or frame 1+) Does the unwinder follow the unwind plan instructions correctly to retrieve saved register values on the stack?

For 1-3, static analysis -- without a live running process on the correct architecture -- is possible and is a worthwhile goal. The unwind plan CREATORS will need some help to work without a live RegisterContext to get register information. (this dependency for this layer is almost exclusively of the form, "I know that rbx is register 2 in the machine's register numbering scheme, what is it in lldb's native register numbering scheme?")

Whether it's a blob of bytes or a .s file that we use clang to build for the appropriate architecture, whatever, if the clang that we build has those targets built in reliably we can do it at testsuite run time.

There's the other side of this that we'll need to come up with an SBUnwindPlan type class that can be queried for all of the unwind details for a function.

For #4, we'll need live processes and/or core files - I don't see any way around that.

The problem with core files is that the test coverage is reduced. If a particular platform doesn’t understand a core file, the test won’t get run. You could argue that that’s fine, because unwinding varies between platforms too, but broader test coverage is always a good thing. For example, what if there’s no buildbot set up for a platform, then you have 0 automated test coverage for that codepath.

If we use LLVM’s assembler to just produce an object file with the correct backend, then I believe this could tested without a core file. Broader coverage. Of course, exposing unwind information through the SB API is definitely a great idea, and would open up a lot more possibilities for heavier tests. We should definitely do that also. But I don’t think it’s a pure substitute. Having broader test coverage means that, for example, if 3 bots are red because of some unrelated breaks, but 2 bots are green, any of those 2 can pick up failures even if the path being tested only ever occurs in practice on a specific architecture.

So in short: I agree that core files are the best way to test the full stack, but I think it would be great if we can come up with a way to break this down into separate components and figure out really what absolutely can’t be tested without running the bits of code on a particular architecture, and what can we abstract away to produce smaller, lighter weight tests that can give us broader coverage.

Looks like we crossed paths and have been thinking about the same things. I like that treating 1-3 separately raises the possibility of testing them in an architecture agnostic way.

As for #4, this might be a moonshot, but what about qEMU or something?