How do I find all memory allocations in an llvm ir code file?

I tried to compile this snippet of C++ code:

void FuncTest() {
int* a = new int;
int* b = new int[2];
}

using:

clang test.cpp -S -emit-llvm -o - > test.llvm

and obtained this:

define void @_Z8FuncTestv() {
entry:
%a = alloca i32*, align 4
%b = alloca i32*, align 4
%call = call noalias i8* @_Znwj(i32 4)
%0 = bitcast i8* %call to i32*
store i32* %0, i32** %a, align 4
%call1 = call noalias i8* @_Znaj(i32 8)
%1 = bitcast i8* %call1 to i32*
store i32* %1, i32** %b, align 4
ret void
}

declare noalias i8* @_Znwj(i32)
declare noalias i8* @_Znaj(i32)

What I am wondering now is: where do the _Znwj and _Znaj symbols come
from? Are they just randomly assigned or is there a system to it? I
would like to be able to tell that the lines

%call = call noalias i8* @_Znwj(i32 4)

and

%call1 = call noalias i8* @_Znaj(i32 8)

perform memory allocations. But it does not look that promising...
Some llvm expert here who has an idea?

Hi,

_Znwj and friends are the C++-name-mangled versions of operator new. Because operator new is so common, the IA64 C++ ABI provided a shorthand for it.

It can be parsed as follows:

_Z: Prefix to all c++ mangled names.
nw: operator new(). The other version is "na": operator new(). ("na"->new array).
j: unsigned int.

All of these can be found in the C++ ABI: http://www.codesourcery.com/public/cxx-abi/abi.html#mangling

You can run an identifier through the g++ tool "c++filt" to get a human-readable representation:

$ c++filt _Znwj
operator new(unsigned int)

Cheers,

James

echo “_Znwj” | c++filt
=> operator new(unsigned int)

echo “_Znaj” | c++filt
=> operator new(unsigned int)

So yes, they are memory allocators. Names are just mangled.

Olivier.

Hi Theresia,

I am no LLVM expert, but c++filt indicates that _Znwj and _Znaj are the mangled
names for new and new operators respectively:

$ c++filt __Znwj
operator new(unsigned int)

$ c++filt __Znaj
operator new(unsigned int)

Hope this helps,
Matthieu

----- Message d'origine ----

De : Theresia Hansson <theresia.hansson@gmail.com>
À : llvmdev@cs.uiuc.edu
Envoyé le : Ven 15 octobre 2010, 13h 37min 37s
Objet : [LLVMdev] How do I find all memory allocations in an llvm ir code

file?

As others have mentioned, C++ mangles names (i.e., it changes the name of a symbol into a string that contains both the name, scope, and type of the variable or function), so if you know what the mangled name is of your allocator, you can recognize it.

Additionally, I believe that functions with return values marked with the noalias attribute are, essentially, memory allocators because the return value is guaranteed to not alias with anything not based off of the return value. See http://llvm.org/docs/LangRef.html#pointeraliasing for more details.

As an aside, I've been thinking for awhile that we should have a "memory allocator" analysis group that identifies different allocators for different source-level languages (i.e., one analysis would recognize malloc, free, realloc, calloc while another would recognize new, new, delete, and delete). There are even analyses you can do to determine if a function is a memory allocator. I have not yet had enough time to implement such an analysis group, but if others think it's a good idea, feel free to write it.
:slight_smile:

-- John T.

Ah ok, thank you for that. I guess I was simply confused by the fact
that I got two different (to me they seemed randomly named) new
functions. Had I tried some more allocations I would probably had
noticed this, my bad :). Again thank you very much.