Proposal: release MDNodes for source modules (LTO+debug info)

Hi All,

In LTO, we load in the source modules and link the source modules into a destination module.
Lots of MDNodes are only used by the source modules, for example Xalan used 649MB for MDNodes after loading and linking, but the actual destination module only has 393MB of MDNodes. There are 649-393MB (40% of 649MB) not used.

MDNodes belong to the Context, deleting modules will not release the MDNodes.

One possible solution is:

In LLVMContext, add “removeUnusedMDNodes" function
It goes through OwnedModules and check if a MDNode is used by any of the modules, if not remove it.
One implementation is to mark a visited MDNode used when traversing the module. After done traversing all modules, we can delete MDNodes in MDNodeSet that are not marked.

In LTOCodeGenerator, add a vector of source modules that are added (these source modules will be linked with DestroySource mode).
In LTOCodeGenerator:: compile_to_file, delete all source modules that are linked in, then call LLVMContext::removeUnusedMDNodes
—> I can’t find a better place to call the function. When we call compile_to_file, we should have done linking in all source modules.
Another possibility is to add a lto API so the linker can delete the source modules and call the API to release MDNodes.

Other options are:
1> Using a different LLVMContext for the destination module, but it didn’t work out since Linker was not designed to work with different LLVMContexts for source vs destination.
2> removeUnusedMDNodes checks if a MDNode is used in a different way (i.e use_empty() && !hasValueHandler()), but it does not remove MDNodes that form cycles.

Comments and suggestions are welcome.

Thanks,
Manman

3) Make the MDNode be owned by the module that uses it?

MDNode is shared among modules so multiple modules can use it, if we
specify an owner for a MDNode, that will prevent sharing.

Manman

Hi All,

In LTO, we load in the source modules and link the source modules into
a destination module.
Lots of MDNodes are only used by the source modules, for example Xalan
used 649MB for MDNodes after loading and linking, but the actual
destination module only has 393MB of MDNodes. There are 649-393MB (40% of
649MB) not used.

MDNodes belong to the Context, deleting modules will not release the
MDNodes.

One possible solution is:

In LLVMContext, add “removeUnusedMDNodes" function
  It goes through OwnedModules and check if a MDNode is used by any of
the modules, if not remove it.
  One implementation is to mark a visited MDNode used when traversing
the module. After done traversing all modules, we can delete MDNodes in
MDNodeSet that are not marked.

In LTOCodeGenerator, add a vector of source modules that are added
(these source modules will be linked with DestroySource mode).
In LTOCodeGenerator:: compile_to_file, delete all source modules that
are linked in, then call LLVMContext::removeUnusedMDNodes
—> I can’t find a better place to call the function. When we
call compile_to_file, we should have done linking in all source modules.
Another possibility is to add a lto API so the linker can delete the
source modules and call the API to release MDNodes.

Other options are:
1> Using a different LLVMContext for the destination module, but it
didn’t work out since Linker was not designed to work with different
LLVMContexts for source vs destination.
2> removeUnusedMDNodes checks if a MDNode is used in a different way
(i.e use_empty() && !hasValueHandler()), but it does not remove MDNodes
that form cycles.

3) Make the MDNode be owned by the module that uses it?

MDNode is shared among modules so multiple modules can use it, if we
specify an owner for a MDNode, that will prevent sharing.

From your stats (40% stuck in the old module) it doesn't sound like this is

buying us anything...

If the old module is deleted, then these MDNodes can be reclaimed.

I think this proposal amounts to a “garbage collector” that clears out now-dead IR objects that are uniqued in the LLVM Context. While MDNodes are your focus, the same thing would apply equally well to ConstantInt and other things that may become unreachable.

Details matter on this (for it to be efficient), but I think that it would be very useful for LLVMContext to have a method that goes through and releases IR objects that aren’t used. This could be used by the LTO driver, and if it actually reduces memory by 40%, that would be huge.

-Chris

Hi All,

In LTO, we load in the source modules and link the source modules into
a destination module.
Lots of MDNodes are only used by the source modules, for example Xalan
used 649MB for MDNodes after loading and linking, but the actual
destination module only has 393MB of MDNodes. There are 649-393MB (40% of
649MB) not used.

MDNodes belong to the Context, deleting modules will not release the
MDNodes.

One possible solution is:

In LLVMContext, add “removeUnusedMDNodes" function
  It goes through OwnedModules and check if a MDNode is used by any of
the modules, if not remove it.
  One implementation is to mark a visited MDNode used when traversing
the module. After done traversing all modules, we can delete MDNodes in
MDNodeSet that are not marked.

In LTOCodeGenerator, add a vector of source modules that are added
(these source modules will be linked with DestroySource mode).
In LTOCodeGenerator:: compile_to_file, delete all source modules that
are linked in, then call LLVMContext::removeUnusedMDNodes
—> I can’t find a better place to call the function. When we
call compile_to_file, we should have done linking in all source modules.
Another possibility is to add a lto API so the linker can delete the
source modules and call the API to release MDNodes.

Other options are:
1> Using a different LLVMContext for the destination module, but it
didn’t work out since Linker was not designed to work with different
LLVMContexts for source vs destination.
2> removeUnusedMDNodes checks if a MDNode is used in a different way
(i.e use_empty() && !hasValueHandler()), but it does not remove MDNodes
that form cycles.

3) Make the MDNode be owned by the module that uses it?

MDNode is shared among modules so multiple modules can use it, if we
specify an owner for a MDNode, that will prevent sharing.

From your stats (40% stuck in the old module) it doesn't sound like this
is buying us anything...

Hi Chandler,

I don't quite get why you think sharing is not buying us anything...
It reduces the memory footprint of the source modules (there is sharing
among the source modules) and the number of MDNodes created for the
destination module (we do not need to re-create the MDNodes that can be
shared).

The amount of sharing may not be that much but it still exists.

I had some experiments earlier on building clang with "-flto -g", if we
dis-allow sharing between source modules and destination module, the memory
footprint for MDNodes will increase by 15%.
If we disallow sharing among the source modules, the memory footprint for
MDNodes will be even larger.

Thanks,
Manman

Other options are:

1> Using a different LLVMContext for the destination module, but it
didn’t work out since Linker was not designed to work with different
LLVMContexts for source vs destination.
2> removeUnusedMDNodes checks if a MDNode is used in a different way
(i.e use_empty() && !hasValueHandler()), but it does not remove MDNodes
that form cycles.

3) Make the MDNode be owned by the module that uses it?

MDNode is shared among modules so multiple modules can use it, if we
specify an owner for a MDNode, that will prevent sharing.

From your stats (40% stuck in the old module) it doesn't sound like this
is buying us anything...

If the old module is deleted, then these MDNodes can be reclaimed.

I think this proposal amounts to a “garbage collector” that clears out
now-dead IR objects that are uniqued in the LLVM Context. While MDNodes
are your focus, the same thing would apply equally well to ConstantInt and
other things that may become unreachable.

I think the big difference is that there appears to be very little
(relatively speaking) overlap between modules for MDNodes. This is
different from types and many other things I suspect.

My question was: if we're not seeing a higher degree of sharing between
modules by using context-uniqued MDNodes, why not have them be owned by the
module rather than the context?

Details matter on this (for it to be efficient), but I think that it would
be very useful for LLVMContext to have a method that goes through and
releases IR objects that aren’t used. This could be used by the LTO
driver, and if it actually reduces memory by 40%, that would be huge.

My only worry is doing a relatively expensive walk over all types and
constants when most are actually shared and don't get deleted. It sounds
like the metadata nodes are weird in that they are mostly disjoint between
modules.

So, in my naive view, we do something like the following:

0) load a source module
1) load another source module
2) merge the second module into the first
3) delete the second module
4) while there are more source modules, goto 1

This would mean that by not sharing the individual source module would use
15% more memory, but based on your OP numbers the final linked memory usage
should still be 40% smaller. That seems like an easy win with very low
complexity?

Perhaps I just am being naive about how the LTO step works or there are
other complications. I just wanted to make sure we considered the easy path
of the module owning the metadata before introducing something to walk all
metadata and delete unreachable bits.

Wouldn't that force the IR linker to *copy* the MDNodes from the source module into the dest module when linking?

-Chris

I’ll describe how the darwin linker uses the LTO interface. It may be amenable to earlier module deletion.

  1. The darwin linker mmap()s each input file. If it is a bitcode file, it calls
    lto_module_create_from_memory()
    then lto_module_get_num_symbols() and lto_module_get_symbol_*() to discover what the module provides and needs.

  2. After all object files are loaded (which means no undefined symbols are left), the linker then calls:
    lto_codegen_create() and then in a for-loop calls lto_codegen_add_module() on each module previously loaded.

  3. After lto_codegen_compile() has returned, the linker does clean up and deletes each module with lto_module_dispose().

It sounds like the linker could call lto_module_dispose() right after lto_codegen_add_module() to help reduce the memory footprint. That would be a simple linker change. A slightly larger linker change would be to immediately call lto_codegen_add_module() right after lto_module_create_from_memory(), then lto_module_dispose(). That is, never have any unmerged modules laying around.

I have no idea is these sort of changes work for the gold plugin.

-Nick

It sounds like the linker could call lto_module_dispose() right after
lto_codegen_add_module() to help reduce the memory footprint. That would be
a simple linker change. A slightly larger linker change would be to
immediately call lto_codegen_add_module() right after
lto_module_create_from_memory(), then lto_module_dispose(). That is, never
have any unmerged modules laying around.

I have no idea is these sort of changes work for the gold plugin.

The gold plugin calls lto_codegen_add_module/lto_module_dispose early.
So it looks like Chandler's idea would be a win for gold but a loss
for ld64 right now.

Cheers,
Rafael

> It sounds like the linker could call lto_module_dispose() right after
> lto_codegen_add_module() to help reduce the memory footprint. That
would be
> a simple linker change. A slightly larger linker change would be to
> immediately call lto_codegen_add_module() right after
> lto_module_create_from_memory(), then lto_module_dispose(). That is,
never
> have any unmerged modules laying around.
>
> I have no idea is these sort of changes work for the gold plugin.

The gold plugin calls lto_codegen_add_module/lto_module_dispose early.
So it looks like Chandler's idea would be a win for gold but a loss
for ld64 right now.

Letting the module own MDNodes may not be a win for gold since it is going
to create multiple copies of MDNodes that could be shared with Context
owning MDNodes.

For example, with debug info type uniquing, the type nodes can be shared
across modules, but with module owning MDNodes, each module will create its
own copy of the type nodes. The advantage is that the MDNodes can be
deleted easily by deleting the module. It is not clearly a win to me.

Manman

Letting the module own MDNodes may not be a win for gold since it is going
to create multiple copies of MDNodes that could be shared with Context
owning MDNodes.

For example, with debug info type uniquing, the type nodes can be shared
across modules, but with module owning MDNodes, each module will create its
own copy of the type nodes. The advantage is that the MDNodes can be deleted
easily by deleting the module. It is not clearly a win to me.

But gold has at most 2 objects loaded at any time.

Cheers,
Rafael

Agreed. Dave and I were chatting about this some and from my
perspective the only disadvantage to this is the time to copy metadata
from one place to the other. Peak memory usage will still go down
since you could then free up each module as you added. The "uniquing"
works basically the same way as adding the same node 400 times to the
folding set - it just returns the same one each time.

I guess I'm confused at where this 40% overhead of MDNodes is coming
from though? Do we know what they are?

-eric

> Letting the module own MDNodes may not be a win for gold since it is
going
> to create multiple copies of MDNodes that could be shared with Context
> owning MDNodes.
>
> For example, with debug info type uniquing, the type nodes can be shared
> across modules, but with module owning MDNodes, each module will create
its
> own copy of the type nodes. The advantage is that the MDNodes can be
deleted
> easily by deleting the module. It is not clearly a win to me.

But gold has at most 2 objects loaded at any time.

The problem is not how many objects co-exist, the problem is how many
copies of the type nodes we create.

We will create a new copy of the type nodes each time we load in a source
module, even though the type nodes exist in the destination module.
So if we have 700 source modules, assuming a type is shared across all 700
modules, the type nodes will be created 701 times.

Cheers,
Manman

>> Letting the module own MDNodes may not be a win for gold since it is
going
>> to create multiple copies of MDNodes that could be shared with Context
>> owning MDNodes.
>>
>> For example, with debug info type uniquing, the type nodes can be shared
>> across modules, but with module owning MDNodes, each module will create
its
>> own copy of the type nodes. The advantage is that the MDNodes can be
deleted
>> easily by deleting the module. It is not clearly a win to me.
>
> But gold has at most 2 objects loaded at any time.
>

Agreed. Dave and I were chatting about this some and from my
perspective the only disadvantage to this is the time to copy metadata
from one place to the other. Peak memory usage will still go down
since you could then free up each module as you added. The "uniquing"
works basically the same way as adding the same node 400 times to the
folding set - it just returns the same one each time.

Let's clear up a few things first, I believe Chandler was proposing moving
the folding set from LLVMContext to Module.
The "uniquing" happens only within the module. It is not "just returning
the same one each time", I would like to rephrase it as creating the same
thing for each module.

With that proposal, we will have:
1> more MDNodes will be created.
     This can be a big issue because it means we are going to recreate the
types for each module.
2> it is easier to free up the MDNodes, it happens when we delete the
source modules.

My proposal earlier was to implement an interface to "garbage-collect" the
MDNodes (and maybe other things LLVMContext, as pointed out by Chris).

I guess I'm confused at where this 40% overhead of MDNodes is coming
from though? Do we know what they are?

The overhead is the MDNodes that are only used by the source modules. I
gave a few examples earlier in another thread:

There are a few cases where we generate MDNodes when linking modules:
1> when a MDNode points to a value that is different from the source module
such as Function*.
2> when we have a cycle in the MDNode graph, all nodes in the cycle will be
created for the destination module.

When the linked module generates different MDNodes, the MDNodes in the
source module become garbage once the source module is deleted.

Cheers,
Manman

Are you sure about that? I haven't looked into it but while building Chromium with LTO, I get:

../../third_party/gold/gold64: fatal error: out of file descriptors and couldn't close any
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.
[secdev:~/chromium/src/out/Release] (master) s$ ulimit -Sn
10000

Looking in /proc, it has 10013 open file descriptors. With only 2 objects loaded at a time, I'd expect many fewer file descriptors to be open. Maybe it only has 2 objects in memory at once but keeps all the file descriptors open?

That is odd. I will debug it in a sec, but we have in the claim_file_hook:

  if (code_gen) {
    if (lto_codegen_add_module(code_gen, M)) {
      (*message)(LDPL_ERROR, "Error linking module: %s",
                 lto_get_error_message());
      return LDPS_ERR;
    }
  }

  lto_module_dispose(M);

In fact, with current gold we call get_view, so the plugin uses the
same fd as gold. It might actually be a bug with gold trying to cache
too many open files.

How are you trying to build it?

Cheers,
Rafael

The standard Chromium build system was modified to add -flto -Os to cflags (which I'm assuming in this case also gets passed to clang++) and -flto to ldflags in chromium/src/build/common.gypi and I think some of the build files for libraries that are built along with Chromium were modified to not use -flto because they didn't work. I'm not the one who did that work, unfortunately, so I can't say for sure exactly what all was modified.

Taking a really quick at the gold code it looks like it tries to keep
8176 files open. I would suggest putting a breakpoint in
Descriptors::close_some_descriptor and checking why it is failing to
close the files.