Intermodule Program Analysis

Hi,
Typical whole program IR level analyses are done by means of module passes. The modules should be linked before the analysis process.
In some rare cases, the analysis needs to be performed across the whole user level code. In other words, suppose that the bitcode files for the program and all shared libraries are available. Also, suppose that the libraries can not be linked, statically. Is it possible to run an analysis (e.g., taint analysis or constant propagation) on the whole user level software stack (at the IR level)? If not, is there any better approach?
Regards.

Hi Ahmad,

Maybe gllvm would work for this use case? There was a similar thread in 2019: https://lists.llvm.org/pipermail/llvm-dev/2019-January/129587.html.

-Jakub

Hi Jakub,
Thanks! IIUC, both gllvm and wllvm work on statically linked objects. I mean they work when everything is contained in the linked bitfile. Therefore, probably, they won’t solve the problem?!
Regards.

I haven’t used gllvm with shared objects and don’t know the details. But on high-level, I think that even if your build system produces multiple binaries/libraries, you should be able to extract bitcode out of each of them, and later link it manually. This obviously won’t work for libraries that you can’t build by yourself.

So, is static linking the only solution? In some cases, static linking is difficult!

Hi Ahmad,

I just tried a toy shared library example and it seems to work fine.

$ cat test_lib.c

int foo(int a, int b) { return a * b; }

$ cat CMakeLists.txt
project(Ahmad)

add_library(ahmad SHARED test_lib.c)

$ mkdir gbuild && cd gbuild
$ cmake … -GNinja -DCMAKE_C_COMPILER=$(readlink -f ~/go/bin/gclang)
$ ninja
$ ~/go/bin/get-bc libahmad.so
$ llvm-dis-11 libahmad.so.bc -o=-
; ModuleID = ‘libahmad.so.bc’
source_filename = “llvm-link”
target datalayout = “e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128”
target triple = “x86_64-pc-linux-gnu”

; Function Attrs: noinline nounwind optnone uwtable
define i32 @foo(i32 %0, i32 %1) #0 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
store i32 %0, i32* %3, align 4
store i32 %1, i32* %4, align 4
%5 = load i32, i32* %3, align 4
%6 = load i32, i32* %4, align 4
%7 = mul nsw i32 %5, %6
ret i32 %7
}

-Jakub

Thanks, Jakub. Shared lib is OK. But is it possible to analyze multiple dependent shared libraries, at the same time?
For example, suppose that the main program is calling sth like what follows:

int main(…)
{
GtkWidget *window;


gtk_window_set_default_size(GTK_WINDOW(window), 300, 200);


}

Suppose that I want to do alias analysis. In other words, I want to know all pointers pointing to the address of “window”. There exists one in the main binary. There will, probably, be some aliases in the GTK library code (passed by “gtk_window_set_default_size()” library call). I have to analyze both binaries (i.e., main and gtk.so) in a single pass. How is it possible?
Regards.

Hello,
Do You have sources for all the libraries that will be distributed as dlls?

Best regards,
Pawel Kunio

pt., 23.04.2021, 22:02 użytkownik Ahmad Nouralizadeh Khorrami via llvm-dev <llvm-dev@lists.llvm.org> napisał:

Hi Pawel,
Yes.

I have to analyze both binaries (i.e., main and gtk.so) in a single pass. How is it possible?

You can compile each with gllvm, extract bitcode, and link those bitcode files together with llvm-link.

True that. Compile everything to bitcode and analyze it ipo. Link libs way you want them in last stage.

Br,
Pk

sob., 24.04.2021, 04:01 użytkownik Jakub (Kuba) Kuderski <kubakuderski@gmail.com> napisał:

Thank you both for the answers, I will try!
Regards.

Hi Ahmad,

llvm-link, as Jakub suggested, is the best option if it works. If it doesn’t work, maybe because the binaries call dlopen or depend on the way dynamic linking works, you might be able to use a research tool I developed last year that helps analyze/optimize dynamically linked code as if it were statically linked. More info here: https://github.com/yotann/bcdb/tree/master/docs/guided-linking

Sean

Hello,
Very interesting and all of them: nix bcdb and gl. Dang, nix should be standard for avoiding dependency hell. Hope, Sean, your workflow and tools will help Ahmad.

Best regards,
Pawel Kunio

niedz., 25.04.2021, 04:29 użytkownik Sean Bartell via llvm-dev <llvm-dev@lists.llvm.org> napisał:

Hi Sean,
The paper looks very interesting!
Thank you all!