[RFC] Making .eh_frame more linker-friendly

Hi,

Many linkers including lld have a feature to eliminate unused sections from output to make output smaller (which is essentially a mark-sweep gc where sections are vertices and relocations are edges). lld and GNU gold have yet another feature, ICF, to merge functions by contents to save more space.

When we remove or merge a function, we want to eliminate its exception handling information as well. But that isn’t very easy to do due to the format of .eh_frame. Here are reasons:

  1. Linkers have to parse, split, eliminate exception handling information for dead functions, and then reconstruct an .eh_frame section. It is tedious, and it doesn’t feel very much like a task that linkers have to do (linkers usually handle sections as opaque blobs and are agnostic of section contents.) That is contrary to other data where section is the atomic unit of inclusion/elimination.

  2. From the viewpoint of gc, .eh_frame has reverse edges to sections. Usually, if section A depends on section B, there’s a relocation in A pointing to B. But that isn’t the case for .eh_frame, but opposite. If section A has exception handling information in .eh_frame section B, B has a relocation against A. This makes implementing a gc tricky, and when it is combined to (1), it is more tricky.

  3. Comparing .eh_frame contents for equivalence is hard. In order to merge functions by contents, we need to verify that their exception handling information is also the same, but doing it isn’t easy given the current .eh_frame format.

So, I don’t feel .eh_frame needed to be designed that way. Maybe we can improve. Here is my rough idea:

  1. We can emit an .eh_frame section for each .text section. So, if you pass -ffunction-sections, the resulting object file would have multiple .eh_frame sections. This makes .eh_frame a unit of garbage collection and eliminates the need to parse .eh_frame contents. It also makes it very easy to compare .eh_frame sections for function merging.

  2. Make each .eh_frame section have a link to its .text section. We could set a section index of a .text section to its corresponding .eh_frame’s sh_link field. This would make gc much easier. (If text section A is pointed by an .eh_frame section B via sh_link, that A is alive means B is alive. It is still reverse, but this is much more manageable.)

I think doing the above things doesn’t break the compatibility with existing linkers, and new linkers can take advantage of the format that is more friendly to the linker. I don’t think of any obvious disadvantage of doing them, except that we would have more sections, but I may be wrong as I’m no expert of .eh_frame.

What do you guys think?

Have you seen the discussion of SHF_LINK_ORDER on the generic-abi@ mailing list? I think it implements exactly what you describe. My understanding is that ARM EHABI leverages this for the same purpose.

https://groups.google.com/forum/#!topic/generic-abi/_CbBM6T6WeM

No I haven’t. Thank you for the pointer.

Looks like the problem of the inverted edges was discussed there. But I guess my bigger question is this: why do we still create one big .eh_frame even if -ffunction-sections is given?

When the option is given, Clang creates .text, .rela.text and .gcc_exception_table sections for each function, but it still creates a monolithic .eh_frame that covers all function sections, which seems odd to me.

I agree, we should fix it. :slight_smile:

The .eh_frame section (which is basically a DWARF .debug_frame section) was not designed with deduplication/gc in mind. I haven’t studied it closely, but it looks like the bulk of it is frame descriptions which are divided up basically per-function, with some common overhead factored out. If you want to put each per-function part into its own ELF section, there’s overhead for that which you are more aware of than I am, and then either you need to replicate the common part into each per-function section or accept a relocation from each per-function section into the separate common section.

Looking at my latest clang build in Ubuntu, the executable has 96320 frame descriptions of which all but one use the same common part; in this case, that common part is 24 bytes. The size is not fixed, but is guaranteed to be a multiple of the target address size, and it probably can’t be any smaller than 24 on a normal machine. This might help give you some estimates about the size effect of different choices.

HTH,

–paulr

The .eh_frame section (which is basically a DWARF .debug_frame section)
was not designed with deduplication/gc in mind. I haven't studied it
closely, but it looks like the bulk of it is frame descriptions which are
divided up basically per-function, with some common overhead factored out.
If you want to put each per-function part into its own ELF section, there's
overhead for that which you are more aware of than I am, and then either
you need to replicate the common part into each per-function section or
accept a relocation from each per-function section into the separate common
section.

Looking at my latest clang build in Ubuntu, the executable has 96320 frame
descriptions of which all but one use the same common part; in this case,
that common part is 24 bytes. The size is not fixed, but is guaranteed to
be a multiple of the target address size, and it probably can't be any
smaller than 24 on a normal machine. This might help give you some
estimates about the size effect of different choices.

Yes, .eh_frame section consists of one or more CIE records followed by one
or more FDE records. Common information in FDEs is factored out to an CIE
to save space. So, if you split an .eh_frame into multiple smaller
.eh_frame, you end up having more CIEs.

But the good news is that even existing linkers deduplicate CIEs by
contents (that's why you saw only one CIE record in your executable, even
though each input object file has at least one CIE record.) So the linked
executables/DSOs would be the same size.

Hi,

There will be problems with eh_frame_hdr. Eh_frame_hdr is needed to use the binary search instead of the linear search. Having eh_frame per a function will cause no eh_frame_hdr or multiple eh_frame_hdr and will degrade search from binary to linear.

As we create eh_frame_hdr in most cases there is no problem to filter out garbage eh_frame sections. If there is information about unused symbols, the implementation is very simple. BTW there is no need to do full decoding of eh_frame records to remove garbage.

Paul is right there will be code size overhead. Eh_frame is usually created per a compilation module with common information in CFI. Multiple eh_frames will cause a lot of redundant CFI. There might be a case when the total size of redundant CFIs will be greater than the total size of removed garbage.

Thanks,

Evgeny Astigeevich

The Arm Compiler Optimization team

Hi,

There will be problems with eh_frame_hdr. Eh_frame_hdr is needed to use
the binary search instead of the linear search. Having eh_frame per a
function will cause no eh_frame_hdr or multiple eh_frame_hdr and will
degrade search from binary to linear.

Linkers would combine .eh_frame sections into one .eh_frame, so that's not
an issue, no?

As we create eh_frame_hdr in most cases there is no problem to filter out
garbage eh_frame sections. If there is information about unused symbols,
the implementation is very simple. BTW there is no need to do full decoding
of eh_frame records to remove garbage.

Paul is right there will be code size overhead. Eh_frame is usually
created per a compilation module with common information in CFI. Multiple
eh_frames will cause a lot of redundant CFI. There might be a case when the
total size of redundant CFIs will be greater than the total size of removed
garbage.

As I wrote in the previous message, I don't think there's a size issue in
link results because even existing linkers merge CIEs by contents.

Note that at least on MIPS you pretty much have to do that anyway to
convert absolute addresses info PC-relative references due to the f**ked
up intra-section constraints.

Joerg

The section is created by the linker, it doesn't matter from an input
perspective.

Joerg

Hi Rui,

It is my fault. I misread your RFC. Now I see it is about to do this in the compiler.

Yes, a linker does all needed magic. It combines all eh_frames, removes garbage and creates eh_frame_hdr.

And yes, your proposal will simplify garbage collection. The main advantage is that you do not need to parse eh_frames.

Thanks,

Evgeny

It sounds like the linker has to be aware of the .eh_frame section details to be able to generate .eh_frame_hdr and eliminate duplicate CIEs, right?

So, is there any difference whether it knows that in one place or two?

Best Regards,
Igor Kudrin
C++ Developer, Access Softek, Inc.

Hi Igor,

It sounds like the linker has to be aware of the .eh_frame section details to be able to generate .eh_frame_hdr and eliminate duplicate CIEs, right?

Yes, a linker needs some details but not all of them. It needs to know sizes of records and initial locations (PC Begin) to find out which functions FDEs belong to.

So, is there any difference whether it knows that in one place or two?

What do you mean “one place or two”? If .eh_frame_hdr is not created a linker does not need to parse .eh_frame sections. It simply merges them into one section. The format of .eh_frame allows to do this without parsing .eh_frame sections.

Thanks,

Evgeny Astigeevich

Hi Evgeny,

Yes, a linker needs some details but not all of them. It needs to know sizes of records and initial locations (PC Begin) to find out which functions FDEs belong to.

So, it still needs some details. Not all of them, but anyway, handling of .eh_frame sections is still a special case, even if we split all the content at compile time.

What do you mean “one place or two”?

If I understand it right, the RFC is about helping a linker to eliminate unneeded .eh_frame items when performing GC. But if we still need to deal with CIEs and generate .eh_frame_hdr in a special way, does it make sense to make this change to simplify only a small part of a linker?

But if we still need to deal with CIEs and generate .eh_frame_hdr in a special way,

does it make sense to make this change to simplify only a small part of a linker?

For huge C++ projects this could improve link time if GC is a bottleneck. It will also improve eh_frame_hdr build time because you don’t spend time on parsing garbage. However a linker will have to have two versions of GC: one with parsing eh_frames and another without parsing. There can be input object files where .eh_frame is not split.

-Evgeny

On the other hand, in this case, you have to deal with lots of small sections. I can’t say for sure, which variant is quicker. Anyway, lld doesn’t do deep parsing of .eh_frame sections currently, nor need it for GC. Just splits them into FDEs and CIEs and then analyzes corresponding relocations. Thus, the amount of required work is more or less the same, whether we have separate sections or have to split the monolithic one.

Keeping .eh_frame separated should still simplifies the linker because until the last step of building .eh_frame and .eh_frame_hdr, we don’t really need to parse .eh_frame sections. So, if we have separate .eh_frame sections on -ffunction-sections, all we have to do is (1) garbage-collect sections and (2) construct .eh_frame and .eh_frame_hdr sections from live .eh_frame sections. At step 1, .eh_frame can be handled as a blob of data. That is much simpler than (1) parsing all .eh_frame sections beforehand, (2) garbage-collect .eh_frame shards, and (3) re-construct .eh_frame and .eh_frame_hdr from .eh_frame shards.

In addition to that, keeping .eh_frame separated should greatly simplifies the logic of identical code folding when comparing not only function contents but exception handler information.