Pool Allocation and DSA

I don't really have a specific question, but, as I've been looking through pool allocation and the DS graphs extensively, I wanted to verify that my understanding of the representations used is correct. Therefore, I'm summarizing my understanding below (which, if it's correct, may hopefully be helpful to others). I would appreciate if someone who understands pool allocation well would comment on whether I'm right in my understanding.

After pool allocation runs, each Function is associated with a DS graph. Each node in a Function's DS graph falls into one of the following categories:

1. The node corresponds to a global pool.
2. The node corresponds to a pool created by the Function.
3. The node corresponds to a formal parameter passed to the function (and is the result of conservatively folding all possible actual parameters that may be passed to the function).
4. The node corresponds to a pool parameter passed to the Function.
5. The node is not in categories 1-4, in which case it must be pointed to by some other pool which either is in categories 1-4 or is pointed to by some other node, for which the same condition must hold. That is to say, all nodes in the graph must be reachable by following the edges of the nodes in categories 1-4.

I'm guessing that the "getPool()" function in llvm-poolalloc/include/poolalloc/PoolAllocate.h will return NULL if the DSNode is in categories 3 or 5, will return a pointer to the Instruction that created the pool for category 2, will return a pointer to the global Value representing the pool for category 1, and will return a pointer to the pool's Argument for category 4 -- but those are really only guesses, and I'd very much appreciate confirmation that I guessed right :slight_smile:

Thanks in advance for your help,
--Patrick

Patrick Alexander Simmons wrote:

I don't really have a specific question, but, as I've been looking through pool allocation and the DS graphs extensively, I wanted to verify that my understanding of the representations used is correct. Therefore, I'm summarizing my understanding below (which, if it's correct, may hopefully be helpful to others). I would appreciate if someone who understands pool allocation well would comment on whether I'm right in my understanding.

After pool allocation runs, each Function is associated with a DS graph. Each node in a Function's DS graph falls into one of the following categories:

1. The node corresponds to a global pool.
2. The node corresponds to a pool created by the Function.
3. The node corresponds to a formal parameter passed to the function (and is the result of conservatively folding all possible actual parameters that may be passed to the function).
4. The node corresponds to a pool parameter passed to the Function.
5. The node is not in categories 1-4, in which case it must be pointed to by some other pool which either is in categories 1-4 or is pointed to by some other node, for which the same condition must hold. That is to say, all nodes in the graph must be reachable by following the edges of the nodes in categories 1-4.
  

I don't believe this is correct, or, at the very least, you are using confusing terminology.

First, DSA and Pool Allocation are two completely different things. DSA is a points-to analysis. It creates a graph (one per function + 1 global graph for global variables) where each node represents a set of memory objects and the edges represent pointers within one memory object that point to another memory object (i.e. if node A points to node B, that means a field in one of the objects in node A points to an object in node B). The DSGraph also contains a ScalarMap which maps LLVM Values to nodes in the graph (these nodes are called DSNodes). DSA is somewhat special because it attempts to disambiguate different linked data structures within programs (hence the name Data Structure Analysis, or DSA).

Automatic Pool Allocation (APA) is a transform that is built on top of DSA. It uses the points-to graphs computed by DSA to group allocations into various pools. The original idea was to place nodes of the same pointer-based data structure into the same pool to get better cache locality. The pools that Automatic Pool Allocation creates can either be global variables or local alloca'ed memory, depending upon how long the pool must be alive, what heuristics Automatic Pool Allocation is using, and whether the DSA results are context sensitive or insensitive. When a pool is created based on context sensitive results, it is stack allocated and passed around as an argument to functions.

However, Automatic Pool Allocation has another use beyond improving performance. It turns out that if you allocate memory based on the points-to analysis results, you can speed up run-time checks for memory safety. This leads to SAFECode: a transform built on top of Automatic Pool Allocation that adds additional information to each pool to implement fast run-time checks.

Transforms like SAFECode need to know the mapping between the pointers in a program and the pool in which those pointers were assigned (more accurately, the pool in which the memory objects pointed to by those pointers were allocated). The easiest way to do this was to have SAFECode query Automatic Pool Allocation. This is, therefore, the purpose of getPool(): it allows other LLVM passes to ask Automatic Pool Allocation what pool a particular LLVM Value lives in. The pool is returned as an LLVM Value *. Depending on whether the pool is global, stack allocated, or passed into the function determines whether it is an LLVM GlobalVariable *, LLVM AllocaInst *, or LLVM Argument *.

-- John T.

Thank you, and I apologize for conflating the two passes.

My main remaining concern is about what happens when there is a node in a function's DSGraph that points to a pool (or pools, if there are multiple callers) that is not in the DSGraph of that function. For example, on page 3 of the 2005 PLDI paper, the DSGraph given for createnode(), Data is pointing to an unlabeled node in the graph. I would like to know specifically what this node is and what would happen if getPool() were called on it. My best guess is that getPool() would return NULL.

Moreover, it seems like it should be possible to call a Function with a pointer without passing in the pointer's pool, as long as no allocations are done using the pointer. What will the DSNode for that pointer in the function's graph look like? What edges will be coming out of it?

Also, after thinking about it for a while, I don't think it's ever possible for a node in the global DSGraph to have pointers to a pool that is not also global, thus preventing the problem of a global node having pointers into some function's DSGraph. Am I right?

Thank you again,
--Patrick

John Criswell wrote:

Patrick Alexander Simmons wrote:

Thank you, and I apologize for conflating the two passes.

My main remaining concern is about what happens when there is a node in
a function's DSGraph that points to a pool (or pools, if there are
multiple callers) that is not in the DSGraph of that function. For
example, on page 3 of the 2005 PLDI paper, the DSGraph given for
createnode(), Data is pointing to an unlabeled node in the graph. I
would like to know specifically what this node is and what would happen
if getPool() were called on it. My best guess is that getPool() would
return NULL.
  

Ah. I think I see the issue now. There may be some instances in which
pool allocation may not need to pass a pointer's pool into a function
because there are no allocations or deallocations from that pool. In
that case, what does getPool() return?

I have no idea. We added the getPool() method to support SAFECode , and
SAFECode requires that the pool for a pointer always be available, so
the above situation cannot (or at least, should not) occur. I didn't
think of the above case when adding getPool() because it's not supposed
to happen for SAFECode, so what it does is anybody's guess.

If your use of pool allocation requires that you can always get the pool
handle for a pointer, you need to make sure that pool allocation is
configured so that it doesn't generate the above scenario. I think
there is an option to make all pools be global pools.

Moreover, it seems like it should be possible to call a Function with a
pointer without passing in the pointer's pool, as long as no allocations
are done using the pointer. What will the DSNode for that pointer in
the function's graph look like? What edges will be coming out of it?
  

In the DSGraph for the function, you should have a DSNode for the
pointer that is passed into the function as a parameter. Whether there
is an explicit pool for it is another matter, but there will be a DSNode
for the pointer.

Also, after thinking about it for a while, I don't think it's ever
possible for a node in the global DSGraph to have pointers to a pool
that is not also global, thus preventing the problem of a global node
having pointers into some function's DSGraph. Am I right?
  

I don't know. Global variables can point to stack and heap objects, so
DSA must have a way to represent this in the DSGraphs. However, if you
configure pool allocation to allow for context sensitivity (i.e., it can
allocate pool descriptors on the stack), then I don't know if it will
place stack/heap objects pointed to by global variables in a global pool
or in a stack-allocated pool.

-- John T.

Some time ago, I used AnalysisUsage's addRequired<PoolAllocatePassAllPools>()
to get this behavior. It didn't make all pools global.

Torvald

Torvald Riegel wrote:

  

If your use of pool allocation requires that you can always get the pool
handle for a pointer, you need to make sure that pool allocation is
configured so that it doesn't generate the above scenario. I think
there is an option to make all pools be global pools.
    
Some time ago, I used AnalysisUsage's addRequired<PoolAllocatePassAllPools>() to get this behavior. It didn't make all pools global.

Torvald
_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
  
Okay, thanks, Torvald. I assume this resulted in more pool descriptor arguments being passed; did you find this to hurt performance at all?

Once I have the local pool descriptor, it's occurred to me that what I basically need is the interprocedural use-def chain from a pool descriptor's Argument to the corresponding Values in the caller functions. Unfortunately, I looked in the class hierarchy and found that Argument is not a subclass of User. Still, perhaps someone has created an interprocedural def-use/use-def analysis pass for LLVM for some other purpose?

Thanks,
--Patrick

Torvald Riegel wrote:
>> If your use of pool allocation requires that you can always get the pool
>> handle for a pointer, you need to make sure that pool allocation is
>> configured so that it doesn't generate the above scenario. I think
>> there is an option to make all pools be global pools.
>
> Some time ago, I used AnalysisUsage's
> addRequired<PoolAllocatePassAllPools>() to get this behavior. It didn't
> make all pools global.
>
> Torvald
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Okay, thanks, Torvald. I assume this resulted in more pool descriptor
arguments being passed; did you find this to hurt performance at all?

I needed all of them, so I didn't check whether it hurt performance. It will
also depend a lot on your target applications. PoolAlloc has some statistics
for the maximum number of pools per function IIRC.

Once I have the local pool descriptor, it's occurred to me that what I
basically need is the interprocedural use-def chain from a pool
descriptor's Argument to the corresponding Values in the caller
functions. Unfortunately, I looked in the class hierarchy and found
that Argument is not a subclass of User. Still, perhaps someone has
created an interprocedural def-use/use-def analysis pass for LLVM for
some other purpose?

The call sites (call and invoke instrs) are the users of the functions and
supply the arguments. Several different pool instances might get passed to a
function. Top-Down and Bottom-Up DSA find these sets to some extend (in order
to unify the DS nodes and pools), but I think none of them guarantees to find a
complete unification. I guess looking at Bottom-Up DSA's code should be
helpful.

Torvald