Splitting C/C++ code into pure and side-effecting code

Hello folks,

I’m not a compiler expert or subscribed to this mailing list, but I have a unique problem. I need to split a large piece of C/C++ code into two separate libraries: one library that only has pure code (i.e., code that doesn’t require operating system interactions) and other library that can have both pure code and side-effecting code.

I was was wondering if it’s possible to achieve this by adding something like attribute((annotate(“pure”))) and attribute((annotate(“call_impure”))) and how much effort will it be to add such a functionality. Basically, I want “pure” attribute to be sticky, in the sense that every function that’s pure, can only call pure code unless explicitly marked inside the function to make impure call. For example, the following code should create one library for pure code, and a driver which has all the impure code.

attribute((annotate(“pure”))
int increment(int x){
return x + 1;
}

attribute((annotate(“pure”)))
int add(uint32_t a, uint32_t b) {
int c = a;

/* addition via pieno arithmetic */
while(b != 0){
c = increment(c); // Okay to call pure code directly.
b–;
}

printf(“%d + %d = %d\n”, a, b, c) attribute((annotate(“call_impure”)));
// calling impure code requires explicit annotation
// all the arguments to the function are copied
// and don’t share the same stack as pure code
return c;
}

int main(int argc, char* argvp){
a = 1;
b = 2;.
c = add(a,b)
printf(“%d + %d = %d”, a, b); // okay to call impure from impure
}

The call to printf in pure code should create a stub function call printf_impure_call(), which in the impure library just calls printf.

I know very little about compilers (mathematician by training), but I will really appreciate if someone can comment about the feasibility of this.

Thanks
Suman

PS: I’m not subscribed to LLVM mailing list so please reply-all.

This probably a better question for the clang developer list cfe-dev.

Hi Suman,

I think you can ascertain pureness automatically leveraging the compiler instead of manually tagging attribute to each method and call-site. It would seem like impurity should be a transitive attribute. So this would conflict with below.

attribute((annotate(“pure”)))
int add(uint32_t a, uint32_t b) { // impure by calling printf…

printf(“%d + %d = %d\n”, a, b, c) attribute((annotate(“call_impure”)));

Finding purity of functions should be possible by parsing each function across files and build up a call dag tree. You can traverse bottom-up to mark each function as impure from callees and using the definition of pure-ness. From there you could find two sets of functions, put them into export files, and use them to link the two libraries as you want.

Regards,
Kevin

Hi Craig,

Please let me know how to move this topic to cfe-dev.

Hi Kevin,

Basically, we didn’t want Haskell like semantics since people are so used to logging things in the middle of their pure code that they will never use it (also, we have some legacy code which will require massive refactoring). The “call_impure” is not absolutely necessary, but its mainly there to avoid accidental calls to impure code from pure code. Most of our code is crypto code and impure calls need some form of prior encryption/authentication, so we would like to avoid accidental impure calls.

Given the size of LLVM+Clang code base, I don’t have a good sense of how much effort it will be to add a functionality like this, and will really appreciate feedback on this.

Thanks in advance.
Best Regards
Suman

Please let me know how to move this topic to cfe-dev.

Mail cfe-dev@lists.llvm.org will do.

Regards,
chenwj