How to ask MustAlias queries from DSA results

Hello, everyone!

I’m writing an automatic memory leak fixing tool recently. For my task, I’m using the DSA (Data Structure Analysis) for alias analysis. In my task, when I detect a memory leak, I need to find a pointer (in C) ‘must-alias’ with the corresponding resource.

In DSA, I think if two Value* point to the same DSNode, they ‘may-alias’. If two Value* point to different DSNode, they ‘not-alias’. However, is it possible to know whether two Value* ‘must-alias’?

I checked the AliasAnalysis interface before it was removed from DSA, the interface never returns MustAlias results. I guess it may not be possible to get ‘MustAlias’ results from DSA. Will it be possible to modify the DSA code so that every heap DSNode tracks all possible malloc() calls it comes from?

Thanks for your help.

Best regards, Zhixuan Yang

If I understand correctly, if you find memory leak, you want to find the corresponding call(s) to malloc() that allocated the memory object, correct? Can you more completely explain what you are trying to accomplish? No. DSA does not track must-alias information. Interesting question. You could add a “Must-Alias” flag that is originally set on a DSNode. Whenever two DSNodes are merged due to a “may-alias” relationship, you could flip the “Must-Alias” flag off. However, DSA is a unification-based analysis, so I would think that the accuracy of a must-alias feature would be pretty weak. Also, DSA loses precision as it performs more inter-procedural analysis (the local analysis will be the most precise but will have many Incomplete DSNodes; the Bottom-Up and Top-Down propagate information up and down the call graph but will cause further DSNode merging). It may be that you will need a more accurate points-to analysis algorithm for your work. Regards, John Criswell

Dear Josh,

If I understand correctly, if you find memory leak, you want to find the corresponding call(s) to malloc() that allocated the memory object, correct? Can you more completely explain what you are trying to accomplish?

Thanks for your reply. In my task, I use data flow analysis to locate a program point where a malloc must be leaked (by must leaked, I mean (a) it must be allocated, (b) must not be free()d and (c) never used in the future). And I want to fix this leak by finding a pointer must point to that malloc(). So I want to perform a must-alias query.

However, DSA is a unification-based analysis, so I would think that the accuracy of a must-alias feature would be pretty weak. Also, DSA loses precision as it performs more inter-procedural analysis (the local analysi>s will be the most precise but will have many Incomplete DSNodes; the Bottom-Up and Top-Down propagate information up and down the call graph but will cause further DSNode merging).

Thanks for your clarification. I agree with you. Even if we implemented a MustAlias interface in DSA, it will be too weak.

It may be that you will need a more accurate points-to analysis algorithm for your work.

In fact, my task can be solved in a simpler (while less elegant) way. If I want to find pointers must-alias with a malloc() call, I can create a new variable storing the result returned by the malloc() when it is called.

Thanks for your help.

Best regards, Zhixuan Yang

When you say “must be allocated,” you mean it must have been allocated via a call to a heap allocator (e.g., malloc(), calloc(), etc), correct? Technically, global variables and stack variables also allocated; they just don’t allocate heap memory. Also, are you performing intra-procedural or inter-procedural data-flow analysis? This is essentially a fat pointer; you’re extending the pointer that you’re checking to contain the base address of the memory object to which it points as well as the memory address to which it points. Since you’re not adding the base address to the pointer but passing it around with the pointer, you must transform the code so that the base address “follows” the pointer value wherever it goes (into memory, passed to functions as arguments, etc). Fat pointers are relatively easy for local variables but are much more of a pain for pointers that are stored to/read from memory or passed to functions as arguments. I’m also of the opinion that every fat pointer approach suffers from some degree of compatibility problems with third-party library code (the infamous “external code” problem). If you’re going to transform the program, I would recommend that you use SAFECode’s new BBAC feature to track the base address. BBAC has a run-time library which can take a pointer to a memory object and calculate, in constant time, the first address of the memory object into which the pointer is pointing. You could use this to find the base address of the memory object so that you can pass it to the free() function. As BBAC is a referent object approach, it doesn’t suffer from the compatibility problems that fat pointer approaches suffer. My Google Summer of Code student, Zhengyang Liu, worked on BBAC this summer and created an updated and robust implementation of it that you could modify for your project. If you’re interested, please email me so that I can put you in touch with him. Regards, John Criswell