LLVM Help on input sanitization


I’m new to LLVM, and am experimenting with the viability of the use of function passes for the sanitization of content in c strings.

This is an excerpt from the IR of some code which performs a strcpy on between two char arrays buf1 and buf2.

if.else: ; preds = %entry
%1 = load i8** %buf1, align 8
%2 = load i8** %buf2, align 8
%call2 = call i8* @strcpy(i8* %1, i8* %2) #3
br label %if.end

I would like to know if there is some way to retrieve:

  1. The sizes of buffers buf1 and buf2
  2. The actual strings in buf1 and buf2

I’ve tried performing a dyn_cast on the first 2 assignments to LoadInst, and then printing out the output of the function getOperand(0) on the LoadInst variable. However, all that I’m seeing are assignment instructions in the form of:

%buf1 = alloca i8*, align 8

%buf2 = alloca i8*, align 8

Do hope that I can receive some advice. Thanks in advance!

Roy A.

Hi Ando,

If the allocas buf1 and buf2 are promotable allocas [1] then you
should be able to run the mem2reg pass to remove the allocas and
transform the strcpy to work directly on SSA values. After that the
IR should be easier to work with. If the two allocas are not
promotable, then you'll have to write some sort of data flow analysis
pass to determine what possible values reach a given load from one of
these allocas.

[1]: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Utils/PromoteMemoryToRegister.cpp#L60

-- Sanjoy