Making LLVM safer in out-of-memory situations

Hello,

Philipp Becker and me, Vaidas Gasiunas, are developers at SAP and part of a team developing a C-like domain-specific language for the SAP HANA in-memory database. We use LLVM as a backend to translate our language to native code, primarily on x86-64 platforms. Our programs are created dynamically, compiled and optimized in a running database. As a result of that we have special requirements with respect to response time and safety. In particular, we have to avoid long compile times and must deal with error situations like out-of-memory without crashing or producing memory leaks in the compiler. The compiler performance is especially important since that we must compile generated functions which tend to be rather long - in the range of thousands of LOC per function.

To address these requirements we have developed a set of patches improving performance and malfunction safety of certain compiler passes and would be interested in contributing them at some point. Before proposing concrete changes, we would like to know what the general interest is with respect to making LLVM safer in out-of-memory situations.

Best Regards,
Vaidas Gasiunas

I'm in favor! I can't imagine we shouldn't be able to craft fixes that don't otherwise adversely affect LLVM.

Hello,

Philipp Becker and me, Vaidas Gasiunas, are developers at SAP and part of a team developing a C-like domain-specific language for the SAP HANA in-memory database. We use LLVM as a backend to translate our language to native code, primarily on x86-64 platforms. Our programs are created dynamically, compiled and optimized in a running database. As a result of that we have special requirements with respect to response time and safety. In particular, we have to avoid long compile times and must deal with error situations like out-of-memory without crashing or producing memory leaks in the compiler. The compiler performance is especially important since that we must compile generated functions which tend to be rather long - in the range of thousands of LOC per function.

To address these requirements we have developed a set of patches improving performance and malfunction safety of certain compiler passes and would be interested in contributing them at some point. Before proposing concrete changes, we would like to know what the general interest is with respect to making LLVM safer in out-of-memory situations.

This sounds pretty awesome to me!

I’m looking forward to seeing what y’all came up with.

-Filip

From: "Rick Mann" <rmann@latencyzero.com>
To: "Vaidas Gasiunas" <vaidas.gasiunas@sap.com>
Cc: "LLVM Dev" <llvmdev@cs.uiuc.edu>
Sent: Thursday, December 12, 2013 1:24:40 PM
Subject: Re: [LLVMdev] Making LLVM safer in out-of-memory situations

> To address these requirements we have developed a set of patches
> improving performance and malfunction safety of certain compiler
> passes and would be interested in contributing them at some point.
> Before proposing concrete changes, we would like to know what the
> general interest is with respect to making LLVM safer in
> out-of-memory situations.

I'm in favor! I can't imagine we shouldn't be able to craft fixes
that don't otherwise adversely affect LLVM.

I also agree, handling low memory situations gracefully is important in a number of different environments.

-Hal

I'm hugely in favor of the general direction. Happy to help by reviewing changes and the like.

This type of work was on my long term todo list; I'm thrilled to see someone else doing it now. :slight_smile:

One question: How are you handling EOM? Error return? Custom region allocator?

Philip

Hi Philip,

Thanks for the positive response from all of you!

One question: How are you handling EOM? Error return? Custom region allocator?

When running into an Out-of-memory situation we're currently only doing an error return, i.e. the compilation fails, but does so without crashing the process in which the compilation/jitting occurs. It is ok for us if llvm returns with a catchable exception and unwinds all allocated memory correctly.

To increase stability for us we have already moved the main part of the compilation to a separate process that may crash in case of an error without doing much harm, i.e. does not crash the database. Therefore, we've currently concentrating on specific components that still remain in the database process, such as CodeLoader and VMCore, which is used for emitting IR code. Although, of course, we're also interested in increasing the general stability of the whole llvm w.r.t. error situations.

Best regards,
Philipp

From: "Philipp Becker" <philipp.becker@sap.com>
To: "Philip Reames" <listmail@philipreames.com>, "LLVM Dev" <llvmdev@cs.uiuc.edu>
Sent: Friday, December 13, 2013 6:55:59 AM
Subject: Re: [LLVMdev] Making LLVM safer in out-of-memory situations

Hi Philip,

Thanks for the positive response from all of you!

> One question: How are you handling EOM? Error return? Custom
> region allocator?

When running into an Out-of-memory situation we're currently only
doing an error return, i.e. the compilation fails, but does so
without crashing the process in which the compilation/jitting
occurs. It is ok for us if llvm returns with a catchable exception
and unwinds all allocated memory correctly.

Does this mean that you're using C++ exception handling to manage the cleanup?

-Hal

Hi Hal,

Does this mean that you're using C++ exception handling to manage the cleanup?

No, not really. From the place where we're calling into llvm we are catching all exceptions that may occur during compilation, but normally we do not add any additional catch clauses into the llvm source itself. We mainly rely on correct stack unwinding by destructors in llvm when an exception is thrown. In the cases when it is not sufficient, we had to add some additional autopointers, an in some cases implement additional unwind logic. In some cases we indeed had to add some exception handling in destructors in cases where they allocate memory, but such fixes are rather workarounds, because a correct solution would be to avoid memory allocation in destructors in the first place.

Best regards,
Philipp

To increase stability for us we have already moved the main part of the compilation to a separate process that may crash in case of an error without doing much harm, i.e. does not crash the database.

Were there any interesting challenges that arose during this process? This seems to be an approach many folks are either taking or considering. If there are things we could do to make this easier, it might be worth considering.

Therefore, we've currently concentrating on specific components that still remain in the database process, such as CodeLoader and VMCore, which is used for emitting IR code. Although, of course, we're also interested in increasing the general stability of the whole llvm w.r.t. error situations.

Understood. If you have patches that need reviewed, I'd be happy to help.

Philip

If I'm reading you correctly, you are relying on exception propagation and handler (destructors for local objects) execution. You have chosen not to add extra exception logic to LLVM itself, but are relying on the correctness of exception propagation within the code. (The last two sentances are intended to be a restatement of what your message said. If I misunderstood, please correct me.)

Does this mean that you're compiling your build of LLVM with exceptions enabled? By default, I believe LLVM is built without RTTI or exception support.

For the particular cases you mentioned with auto pointers and allocation in destructors, are these issues also present along existing error paths? Or for that matter simply examples of bad coding practice? If so, pushing back selected changes would be welcomed. I'd be happy to help review.

Philip

To increase stability for us we have already moved the main part of the compilation to a separate process that may crash in case of an error without doing much harm, i.e. does not crash the database.

Were there any interesting challenges that arose during this process?
This seems to be an approach many folks are either taking or
considering. If there are things we could do to make this easier, it
might be worth considering.

After porting our project to LLVM3.1, we realized that we can use
the MCJIT architecture to move compilation into a separate process,
because it enables loading ELF objects generated in another process.
In fact, it worked as expected. It was really an important improvement
for our use scenario.

Maybe what was not so nice is that we had to patch RuntimeDyld to
adapt it to our requirements. We had to extend it with a method for
computing the total size of memory required for loading all sections
required for execution, so that we can allocate one memory block for
all sections, and another method to retrieve the address ranges of all
loaded functions so that we have a mapping from addresses to function
names.

Regards,
Vaidas

Hi Philip,

If I'm reading you correctly, you are relying on exception propagation
and handler (destructors for local objects) execution. You have chosen
not to add extra exception logic to LLVM itself, but are relying on the
correctness of exception propagation within the code. (The last two
sentances are intended to be a restatement of what your message said.
If I misunderstood, please correct me.)

It was probably not completely correct to say that we did not extend
the exception propagation in LLVM. In most of cases where malloc or
other C allocation functions are called, we had to add a check for NULL
and throw std::bad_alloc. But these are a kind of straightforward fixes,
that do not require much effort.

Does this mean that you're compiling your build of LLVM with exceptions
enabled? By default, I believe LLVM is built without RTTI or exception
support.

OK, I see. This explains why the destructors in LLVM are not always
prepared to be executed in exception situations. Yes, we build LLVM
with exception support. In principle, we build it with the same options
like the rest of our project. Actually, I could hardly imagine that
we could handle OOM situations without error handling.

For the particular cases you mentioned with auto pointers and allocation
in destructors, are these issues also present along existing error
paths? Or for that matter simply examples of bad coding practice? If
so, pushing back selected changes would be welcomed. I'd be happy to
help review.

Yes, there are some examples of bad coding practice. The root problem,
however, is that the destructors in LLVM do a lot of complicated stuff.
Instead of just deleting objects following a strictly hierarchical ownership
structure, the objects are unregistered in various relationships, which
potentially trigger unwinding in quite different locations. Such non-trivial
coding sometimes requires dynamic allocation of new collections, which is
problematic in OOM situations. Actually, we did not manage to completely
fix the unwinding of the compiler state. That was one of the reasons to
move all compiler passes to a separate process.

Here is a rough overview of our current set of patches to LLVM3.3. In total,
we have 31 patches to LLVM related to OOM handing. They fix only the
components that we could not outsource to the separate process:
the core IR classes that we use for IR generation, IR bitcode serialization,
and the dynamic code loader.

* In 8 of the patches we fix the malloc calls to throw bad_alloc.
* 10 patches fix destructors. Some of fixes are about disabling or
rewriting the code triggering dynamic allocations. Some other fixes
disable asserts that check for invariants that are not valid in exception
situation.
* 5 patches deal with exceptions in constructors. One kind of problems
results from the fact that if an exception is thrown in the constructor of
an object, the destructor is not called which causes a leak.
There is also a specific problem that IR objects register themselves with
their parents already in their constructor. And if a constructor fails
afterwards, the parent contains a dangling pointer to its child.
* 5 patches fix temporary ownership of objects, mostly the situations
when object are created, but not added to the owning collections.

Note that 31 patches does not mean 31 fixes, because some of them
contain all fixes to a particular file or class.

Regards,
Vaidas

Adding Andrew Kaylor to the conversation.

Hi Vaidas,

I would think you could use a simple allocation scheme on the host side and then allocate a single block (or perhaps one code block and one data block) in the target process to receive everything, since everything is loaded on the host before you need to copy it to the target process. But perhaps I'm missing something regarding your particular scenario.

In any event, there are good reasons why a memory manager (particularly for local use) might want to know the total size before allocation begins.

I'm not sure I understand the problem you're solving with regard to function addresses. I do know that there's a shortcoming where the host address isn't directly available after section addresses have been remapped for remote use. (LLDB will run into that problem when it stops using deprecated functions.) So I expect a change of some sort will be necessary there.

Are you considering submitting your patches for incorporation into LLVM trunk?

-Andy

Hi Philip,

If I'm reading you correctly, you are relying on exception propagation
and handler (destructors for local objects) execution. You have chosen
not to add extra exception logic to LLVM itself, but are relying on the
correctness of exception propagation within the code. (The last two
sentances are intended to be a restatement of what your message said.
If I misunderstood, please correct me.)

It was probably not completely correct to say that we did not extend
the exception propagation in LLVM. In most of cases where malloc or
other C allocation functions are called, we had to add a check for NULL
and throw std::bad_alloc. But these are a kind of straightforward fixes,
that do not require much effort.

Does this mean that you're compiling your build of LLVM with exceptions
enabled? By default, I believe LLVM is built without RTTI or exception
support.

OK, I see. This explains why the destructors in LLVM are not always
prepared to be executed in exception situations. Yes, we build LLVM
with exception support. In principle, we build it with the same options
like the rest of our project.

This is useful to know. Just fair warning, you're probably running a fairly odd configuration compared to the rest of the LLVM community. That might expose "interesting" bugs. (Note: I have no evidence that the exceptions enabled config is actually rare, just my perception watching list traffic.)

Actually, I could hardly imagine that
we could handle OOM situations without error handling.

Agreed, but error handling does not imply exceptions. It might be the easiest way, but it's not the only one.

For the particular cases you mentioned with auto pointers and allocation
in destructors, are these issues also present along existing error
paths? Or for that matter simply examples of bad coding practice? If
so, pushing back selected changes would be welcomed. I'd be happy to
help review.

Yes, there are some examples of bad coding practice. The root problem,
however, is that the destructors in LLVM do a lot of complicated stuff.
Instead of just deleting objects following a strictly hierarchical ownership
structure, the objects are unregistered in various relationships, which
potentially trigger unwinding in quite different locations. Such non-trivial
coding sometimes requires dynamic allocation of new collections, which is
problematic in OOM situations.

This is probably not going to change.

One simple hack to get around this would be to have a reserved allocation set which is released on OOM. This would enable a small number of allocations during unwind without double-OOM. Getting the allocation amount right is a bit challenging, but such schemes can be made to work.

Actually, we did not manage to completely
fix the unwinding of the compiler state. That was one of the reasons to
move all compiler passes to a separate process.

Here is a rough overview of our current set of patches to LLVM3.3. In total,
we have 31 patches to LLVM related to OOM handing. They fix only the
components that we could not outsource to the separate process:
the core IR classes that we use for IR generation, IR bitcode serialization,
and the dynamic code loader.

I'm about to break these down by how likely I believe these changes are to be accepted back if you wanted to push them. Keep in mind that this is only my opinion and that I do not speak for the community as a whole.

* In 8 of the patches we fix the malloc calls to throw bad_alloc.

This is probably not going to be accepted as is.

* 10 patches fix destructors. Some of fixes are about disabling or
rewriting the code triggering dynamic allocations. Some other fixes
disable asserts that check for invariants that are not valid in exception
situation.

These might be accepted on a case by case basis. Depends on the actual change in question.

* 5 patches deal with exceptions in constructors. One kind of problems
results from the fact that if an exception is thrown in the constructor of
an object, the destructor is not called which causes a leak.

Framed as exception propagation, these probably wouldn't be accepted. However, it sounds like a general error return pattern could address the same issue. That might be accepted.

There is also a specific problem that IR objects register themselves with
their parents already in their constructor. And if a constructor fails
afterwards, the parent contains a dangling pointer to its child.

Same as previous.

* 5 patches fix temporary ownership of objects, mostly the situations
when object are created, but not added to the owning collections.

These would probably be accepted.

Note that 31 patches does not mean 31 fixes, because some of them
contain all fixes to a particular file or class.

How willing are you to share your code? If you're willing to put your changes up on github or something, we could go through them and attempt to migrate some of them back for inclusion in base llvm.

Philip

Hi Andy,

I would think you could use a simple allocation scheme on the host side and then allocate a single block (or perhaps one code block and one data block) in the target process to receive everything, since everything is loaded on the host before you need to copy it to the target process. But perhaps I'm missing something regarding your particular scenario.

In any event, there are good reasons why a memory manager (particularly for local use) might want to know the total size before allocation begins.

We subclass RTDyldMemoryManager interface to implement allocateDataSection and allocateCodeSection. As you know, when RuntimeDyld loads a module, it calls one of these methods for each section. I want that all sections of a module are allocated in one block. One reason is efficiency, another is that 32-bit relocations fail when the blocks are too far from each other. So we compute the total size of all sections and allocate one block for module and give parts of that block when allocateDataSection and allocateCodeSection are called by the RuntimeDyld.

I'm not sure I understand the problem you're solving with regard to function addresses. I do know that there's a shortcoming where the host address isn't directly available after section addresses have been remapped for remote use. (LLDB will run into that problem when it stops using deprecated functions.) So I expect a change of some sort will be necessary there.

We need this info when unwinding the call stack in case of a crash. We have to map the program counter to a function name. So we need a simple symbol info that maps from address ranges to function names. When we implemented this functionality, there was no method in the RuntimeDyld to iterate over all loaded functions. In LLVM3.3, though, I see that loadModule returns ObjectImage. So maybe now I could access the symbols directly from ObjectImage, without patching RuntimeDyld. I will try that.

Are you considering submitting your patches for incorporation into LLVM trunk?

Yes, we would be glad to contribute our patches. I have to retest everything on trunk, because until now we used only the official LLVM3.3 release.

Regards,
Vaidas

One simple hack to get around this would be to have a reserved
allocation set which is released on OOM. This would enable a small
number of allocations during unwind without double-OOM. Getting the
allocation amount right is a bit challenging, but such schemes can be
made to work.

The problem is not only double OOM, but also various invariants assumed by the destructors. If we jump out somewhere in the middle of a compilation pass, objects may be in states violating typical invariants. For example, if you allocate an array of objects and then call their in-place constructors, you may have a situation that the in-place constructors are called for the half of the array, while the other half is uninitialized. If you don't do anything special to handle such situation, the destructor does not know that only half of objects are initialized and calls destructors for all of them or none of them.

The best would be to isolate memory related to compilation of particular module by using some module allocator. Then in case of an error, we could release the entire memory of the allocator at once without calling destructors. But that of course would be a big change.

Actually, I could hardly imagine that
we could handle OOM situations without error handling.

Agreed, but error handling does not imply exceptions. It might be the
easiest way, but it's not the only one.

You won't need to add explicit check after each allocation and each function calls, except to the calls to throw() functions. Also, you would need to avoid errors in constructors.

I'm about to break these down by how likely I believe these changes are
to be accepted back if you wanted to push them. Keep in mind that this
is only my opinion and that I do not speak for the community as a whole.

Of course, your feedback is very helpful.

How willing are you to share your code? If you're willing to put your
changes up on github or something, we could go through them and attempt
to migrate some of them back for inclusion in base llvm.

I think we can share the changes that are not specific to the our project. It would be probably the best if we migrate them to LLVM trunk first.

Regards,
Vaidas

Hi Vaidas,

Thanks for the feedback.

Regarding the single allocation, I'm not opposed to having an option to pre-calculate the size and provide it in advance to the memory manager. We actually had an implementation that worked that way once, but it never got checked in because we decided the associated implementation was too complicated to use as the base memory manager. The pre-calculation will need to be optional so that extra calculation isn't imposed on clients that don't need it.

With your explanation below I understand the issue with function address mapping. You've probably seen that RuntimeDyld already keeps a map of symbol names to addresses. My concern with having RuntimeDyld keep the additional information to lookup symbols based on addresses is that MCJIT already consumes more memory than we'd like. If you can make your implementation work with the ObjectImage that would probably be best.

-Andy

Hi Andy,

I ported our patch for the precalculation of size to LLVM trunk and wrote a small unit test for it. The basic idea is that RuntimeDyld after creating ObjectImage, but before loading it notifies the memory manager about the total space required to allocate all sections.

Note that we coded it for our requirements, so it may still need some adjustments to make it general enough for all possible uses. For example, I am not sure what would be the best way to make this calculation of size optional. For example, I could add another virtual method in the memory manager, to ask whether it needs the information about total size.

BTW do you think we should move further discussion to llvm-commits?

Regards,
Vaidas

computeTotalAllocSize.patch (7.56 KB)

Did you look into another way of looking at this.
E.g. I have source code X. I have 1 Gig of RAM and no SWAP file. Will
LLVM compile it or run out of memory in the process?

Running LLVM and being able to retrieve stats like "Max RAM used
during compile" might be a workable solution to mitigate the
likelihood of OOM.

The reason I am interested in this is that cloud computing sometimes
gives you cheap Virtual Machines with quite low limits for resources
like RAM.
It would be nice to know before trying to compile some source code,
that it will not fail due to OOM before you start.

Kind Regards

James