RFC: IR metadata format for MemProf

teresajohnson · October 17, 2023, 8:58pm

Thanks for the details. Note that -supports-hot-cold-new is used during the LTO link - so please add -Wl,-mllvm,-supports-hot-cold-new to the thin link (it should not be needed by the initial or backend compiles). If it works then when you disassemble a.o.thinlto.bc you should also no longer see “none” for the allocs versions.

Let me know if that fixes it or if you see more issues.

Teresa

ssslifengxiang · October 18, 2023, 2:51am

Note that -supports-hot-cold-new is used during the LTO link - so please add -Wl,-mllvm,-supports-hot-cold-new to the thin link (it should not be needed by the initial or backend compiles). If it works then when you disassemble a.o.thinlto.bc you should also no longer see “none” for the allocs versions.

Thanks for reply. It works when adding -Wl,-mllvm,-enable-memprof-context-disambiguation -Wl,-mllvm,-supports-hot-cold-new to thin link. I also can see allocs: ((versions: (notcold, none, cold, notcold) in a.o.thinlto.bc

lifengxiang1025 · January 17, 2024, 2:40am

Hi @teresajohnson @snehasish
I and @Enna1 found now MemProf handle the callchain which may have circles in thin link. It means the final callchain alloc type may be incorrect in our opinion.
For example, there are two callchain:

A->B->A->B->new
A->A->B->A->B->new

These two callchain will be considered as one callchain in thinlink:

A->B->new

And the alloc type is the result of the OR of two callchains.
I think it’s may better if we handle callchain in llvm-profdata, truncating the callchain and then merging MIB info. The final alloc type is calculated by merged MIB info.

teresajohnson · January 18, 2024, 9:58pm

Hi @lifengxiang1025 , thanks for the note (and the patch, will respond on that shortly)!

Recursion is a little tricky, and I tried to have it be conservatively correct. In the example below, where there is some mutual recursion, we should be detecting this during the thin link and preventing cloning.

The handling is here: llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp at a9ca820529c69674e01d2e90cadc69e361ecf339 · llvm/llvm-project · GitHub
(and where we check the Recursive flag throughout this file).

There’s presumably some more sophisticated ways of handling these cases, but more difficult to get the necessary cloning correct. E.g. if we collapsed A->B->A->B->new to A->B->new we might miss renaming the call from B->A with the appropriate callee clone of A. So I chose to leave the recursion visible in the metadata and handle it conservatively. Definitely open to suggestions and/or patches!

lifengxiang1025 · January 19, 2024, 6:50am

Thanks for reply. I can’t understand why we might miss renaming the call from B->A with the appropriate callee clone of A. Could you please explain that in more detail?

teresajohnson · January 22, 2024, 9:16pm

It’s probably easier to discuss in terms of callsites, not functions, since that’s what is captured in the call contexts and what the cloning algorithm primarily operates on. So for example, if we consider the following code:

A() {
  B(); // callsite a1;
}
B() {
  if ()
    A(); // callsite b1;
  new(); // callsite b2;

The call chain A->B->A->B->new is actually the callsite context a1->b1->a1->b2.

If we collapse that to A->B->new then we are collapsing the context to a1->b2. If we clone for that we completely miss cloning callsite b1, and anything that goes through b1 will call the uncloned version of A(). I.e. we would get the following cloned code for the cold memory allocations:

A.memprof.1() {
  B.memprof.1(); // callsite a1';
}
B.memprof.1() {
  if ()
    A(); // callsite b1;
  new(COLD hint); // callsite b2';

If we didn’t collapse and we kept the context as-is, we have a cycle in the CCG (CallsiteContextGraph) that the cloning operates on. Right now the cloning code doesn’t handle this case, so we handle it conservatively and don’t clone. This certainly can be improved. I.e. if all contexts with different levels of recursion through these callsites all lead to cold memory allocations, ideally we would want the final cloned code like:

A.memprof.1() {
  B.memprof.1(); // callsite a1';
}
B.memprof.1() {
  if ()
    A.memprof.1(); // callsite b1';
  new(COLD hint); // callsite b2';

teresajohnson · January 22, 2024, 10:49pm

Actually, thinking through it with callsites reminds me of why I was handling direct recursion differently. Consider this case:

A() {
  if (...)
     A(); // callsite a1
  new(); // callsite a2
}

With the direct recursion flattening, contexts with a regex like a1->[a1->]*->a2 all become a1->a2. There is no cycle in the CCG graph. Assuming that all the profiled
recursive contexts that collapse to the same thing have the same memory coldness, we should be able to create the cloned code:

A.memprof.1() {
  if (...)
     A.memprof.1(); // callsite a1'
  new(COLD hint); // callsite a2'
}

lifengxiang1025 · January 23, 2024, 8:43am

teresajohnson:

It’s probably easier to discuss in terms of callsites, not functions, since that’s what is captured in the call contexts and what the cloning algorithm primarily operates on. So for example, if we consider the following code:
A() {
  B(); // callsite a1;
}
B() {
  if ()
    A(); // callsite b1;
  new(); // callsite b2;

Thanks!! I’m now aware that truncating callsite context is not a good idea.
Then I think we may filter circle callsite context to chain which is a directed chain from main to new.
For example:

//circle callsite context 
main1->A1->B1->C1->B1->D1->E1->F1->D2
//after processing
main1->A1->B1->D1->E1->F1->D2

For your example, assume now we have four callsite context:

1: a1->b1->a1->b1->a1->b1->a1->b2
2: a1->b1->a1->b2
3: a1->b1->a1->b1->a1->b2
4: a1->b2

And corresponds to four MIB info:MIB1,MIB2,MIB3,MIB4.
The directed chain from main to now in callsite context 1 2 3 is a1->b2 which is same as callsite context 4.
So we merge MIB1234 to MIB5 in llvm-profdata and the compiler will see only one callsite context and MIB info:

a1->b2 
MIB5

What is your opinion？

teresajohnson · January 24, 2024, 3:19am

Yep, I think we should probably take the approach of doing some flattening and MIB merging in llvm-profdata. As mentioned earlier it would result in somewhat less accurate cloning though if we flattened down to just a1->b2, as any call path through b1 will not reach the cloned and hinted allocation.

To address this I think we would actually only want to merge 1, 2, and 3, into:

b1->a1->b2
MIB123 (merged from MIB1, MIB2, and MIB3)

I’m not sure what to do about context 4, although in real cases presumably a1 is called from somewhere else, e.g. c1->a1->b2, so we would be able to distinguish it and handle it appropriately.

Topic		Replies	Views
RFC: A binary serialization format for MemProf LLVM Dev List Archives	24	562	October 10, 2021
RFC: Sanitizer-based Heap Profiler LLVM Dev List Archives	19	404	July 13, 2021
RFC: Sanitizer-based Heap Profiler LLVM Dev List Archives	14	400	July 9, 2020
[RFC] Profile guided section layout LLVM Dev List Archives	23	483	August 5, 2017
[RFC] Changing .llvm.call-graph-profile to use relocations LLVM Dev List Archives	10	220	June 23, 2021

RFC: IR metadata format for MemProf

Related topics