Tool for easy test seam injection in legacy C code

Hello all,

I want to create a tool that facilitates easy test seam injection in legacy C code.
The reason for this is many colleagues (including myself) finding it very tedious and maintenance intensive to create stubs / mocks for our legacy C code base. This causes us to write less unittests than we want to. We currently use a routing mechanism with #defines in a .h file that reroutes functions to stubs/mocks and is included when we compile for unittesting.

I have been researching several ways of creating test seams, and have settled on a method which leverages function pointers. A welcome help was the article at http://meekrosoft.wordpress.com/2012/07/20/test-seams-in-c-function-pointers-vs-preprocessor-hash-defines-vs-link-time-substitution/
But to make it truly easy, I need the compiler to do it for me. This is were clang/llvm comes in.

What I want is the following. Given the following 2 files:

// interface.h

attribute((easy_test_seam))

int foo(void);

// implementation.c

int foo(void)

{

return 2;

}

I actually want it to be compiled as:

// interface.h

extern int (*foo)(void);

// implementation.c

int foo_orig(void)

{

return 2;

}

int (*foo)(void) = foo_orig;

In a later stage we could also add standard stub generation, maybe an attribute like:

attribute((easy_test_seam(5)))
int foo(void);

Which would also create a standard stub that returns 5;

I see this method as a huge relief. It’s maintenance friendly, we only have to add the attribute to the legacy interfaces. We will only act on the attribute if we compile for unittesting. Since we then won’t have changed production code it also won’t mess up static analysis of production code and won’t fool IDE’s jump to definition functionality.

The last few days I have been researching how to do this.

I have seen many suggest source-to-source transformation using the clang Rewriter class. This is not my preferred choice, I don’t have the need to edit the *.c and *.h files, I just want them compiled differently. I would also have to extend our build system to clone my *.c and *.h files, run them through the source-to-source transform tool, and then compile the resulting .c and .h files. This is not insurmountable, but not trivial either.

I’ve seen some suggestions against changing the clang AST, because it’s supposed to be immutable and is hard.

I’ve also seen some suggestions to do LLVM IR transform passes, but I haven’t dived deep enough in LLVM IR to determine if this can work or not.

I’m very new to clang and LLVM, can you guys help me determine which of the three ways would be the way to go? Or maybe there is another way?

Thanks in advance!

Regards,

Ivan

I've seen some suggestions against changing the clang AST, because it's
supposed to be immutable and is hard.

As "outsider" I think it is a clear need for injecting stuff into the AST.

Maybe one should collect all the projected use cases into one place to
evaluate the effort. From the point of view of AST visitors and matchers
(as also stated in this video The Clang AST - a Tutorial - YouTube)
I understand that the AST should be immutable, but what about allowing such
AST injection after building the AST but before any ast matching, visiting,
rewriting phase. Re-running semantic analysis after such injection would be
obviously a must.
Of course one should be careful on what he/she is doing and not to mix such
an injection phase with ast matching.

Hello all,

I want to create a tool that facilitates easy test seam injection in legacy C code.
The reason for this is many colleagues (including myself) finding it very tedious and maintenance intensive to create stubs / mocks for our legacy C code base. This causes us to write less unittests than we want to. We currently use a routing mechanism with #defines in a .h file that reroutes functions to stubs/mocks and is included when we compile for unittesting.

I have been researching several ways of creating test seams, and have settled on a method which leverages function pointers. A welcome help was the article at http://meekrosoft.wordpress.com/2012/07/20/test-seams-in-c-function-pointers-vs-preprocessor-hash-defines-vs-link-time-substitution/
But to make it truly easy, I need the compiler to do it for me. This is were clang/llvm comes in.

What I want is the following. Given the following 2 files:

// interface.h

attribute((easy_test_seam))

int foo(void);

// implementation.c

int foo(void)

{

return 2;

}

I actually want it to be compiled as:

// interface.h

extern int (*foo)(void);

// implementation.c

int foo_orig(void)

{

return 2;

}

int (*foo)(void) = foo_orig;

In a later stage we could also add standard stub generation, maybe an attribute like:

attribute((easy_test_seam(5)))
int foo(void);

Which would also create a standard stub that returns 5;

I see this method as a huge relief. It’s maintenance friendly, we only have to add the attribute to the legacy interfaces. We will only act on the attribute if we compile for unittesting. Since we then won’t have changed production code it also won’t mess up static analysis of production code and won’t fool IDE’s jump to definition functionality.

The last few days I have been researching how to do this.

I have seen many suggest source-to-source transformation using the clang Rewriter class. This is not my preferred choice, I don’t have the need to edit the *.c and *.h files, I just want them compiled differently. I would also have to extend our build system to clone my *.c and *.h files, run them through the source-to-source transform tool, and then compile the resulting .c and .h files. This is not insurmountable, but not trivial either.

It is pretty much the same as editing the AST directly. You can do all the source editing in memory in a single tool, and do the compile step on the changed source from the same tool; no need to write anything to disk (in fact, the rewriting tools all have the source in memory before they edit the files on disk).

Allright, so if I don’t call Rewriter::overwriteChangedFiles() it won’t change the files.

I’ve been getting a bit lost in the clang api forest, but if I want to do the above described with my own compiled clang-cl.exe, I guess I’ll have to insert a 2nd CompileJobAction after the actionlist is made by TheDriver.BuildCompilation(argv) in clang\tools\driver\driver.cpp?
How would one let this 2nd CompileJobAction use the input source file that is still in memory after the 1st compile action is finished?

Thanks,

Ivan

Allright, so if I don’t call Rewriter::overwriteChangedFiles() it won’t change the files.

I’ve been getting a bit lost in the clang api forest, but if I want to do the above described with my own compiled clang-cl.exe, I guess I’ll have to insert a 2nd CompileJobAction after the actionlist is made by TheDriver.BuildCompilation(argv) in clang\tools\driver\driver.cpp?
How would one let this 2nd CompileJobAction use the input source file that is still in memory after the 1st compile action is finished?

I think you don’t want to use the driver in that case - you want to write your own little tool, and that can do the compilation and the rewriting and otherwise look just like the driver. That way you control all the buffers for all the files. libTooling has some stuff to make life easier there…

Hello everyone,

Excuzes moi for digging up this amost 2 year old thread.
I actually created the functionality mentioned in my opening post at the end of 2014.
At my company we have been using it for nearly a year now with great satisfaction.
Together with FFF (fake functon framework) and some preprocessor metaprogramming it becomes a really powerfull method to make legacy C code testable.
We are running almost 900 tests with google-test and easy-test-seam on both Windows and an embedded Linux platform everytime one of our engineers commits new code.

I ended up using the clang driver and mimicing the FixItRecompile functionality with the rewriters. The functionality can be toggled by adding -inject-test-seams on the commandline.
The implementation can be found at https://github.com/ivankoster/clang/commit/e56887a20c3cd076db27b377c92eaa0096eaefcd (I extended clang 3.7 and clang 3.8 with it)

Unfortunately, since i based it on FixItRecompile, easy-test-seam generates the AST twice in one compiler invocation.
In my benchmarks on our code base this means a running time increase of about 66%.
This doesn’t sound strange if you know that the AST is generated, a bunch of files are rewritten and copied for the next AST generation to use it, then the AST is generated a 2nd time.
Lately this performance loss has been starting to bug me a bit. I’d like to save as much of my time as well as my fellow colleagues time :slight_smile:

I’m wondering if something like easy-test-seam is also possible to do with just one AST generation, by just generating the AST I want in the first place, without source rewriting.
I imagine this being a lot more complex then how it works right now.
I also have no clue where to start. Creating easy-test-seam in the first place wasn’t that easy since I knew nothing about the clang codebase :slight_smile:

So, does anyone have an idea if this is remotely possible in one AST generation, and if so, can point me into the right direction?
Or maybe someone is interested in this idea and wants to help create it?

Kind regards,

Ivan Koster