(no subject)

Hi all,

While working on alias support for the LLVM-ML project, I ran into a feature implemented back in 2010: default-null weak externals in COFF, a GNU extension.
https://reviews.llvm.org/rG17990d56907b

I’d like to disable this feature when targeting MSVC compatibility. Does anyone have more context on this, and why it’d be a terrible idea?

For context: This seems to be designed to let LLVM implement a GNU extension in COFF libraries. However, it leads to very different behavior than we see for cl.exe (and ml.exe) on Windows; for already-defined aliasees, it injects an alternate placeholder “.weak..default.” symbol which resolves back to the current location. I admit, I’m not quite sure how this helps. If anyone can explain the purpose, I’d really appreciate it!

In Windows PE/COFF files, aliases typically just resolve to their target symbol. For an example, see https://reviews.llvm.org/D87403#inline-811289.

Thanks,

  • Eric

Sadly I don’t recall why I made that change, probably should have included that in the commit message. I do know that at the time I was focused on link.exe support, not mingw. I just looked through my commit history from that time and it doesn’t really explain much.

I believe that as long as the “compare weak symbol against null” does the right thing any changes you make here are fine.

  • Michael Spencer

Thanks, Michael! However… comparing weak against null isn’t actually supported by MSVC, to the best of my knowledge. If we want to maintain that in the PE/COFF world, I think we might have to accept some weird behavior. I was wondering how much people will scream if I disable this feature in MSVC-compatible targets.

Best,

  • Eric

Hi,

While working on alias support for the LLVM-ML project, I ran into a feature
implemented back in 2010: default-null weak externals in COFF, a GNU
extension.
rG17990d56907b
I'd like to disable this feature when targeting MSVC compatibility. Does
anyone have more context on this, and why it'd be a terrible idea?

For context: This seems to be designed to let LLVM implement a GNU extension
in COFF libraries. However, it leads to very different behavior than we see
for cl.exe (and ml.exe) on Windows; for already-defined aliasees, it injects
an alternate placeholder ".weak.<alias>.default.<uniquifier>" symbol which
resolves back to the current location. I admit, I'm not quite sure how this
helps. If anyone can explain the purpose, I'd really appreciate it!

So, for the GNU extension, from the user point of view, there's two potential usecases.

A translation unit can reference a function declaration with __attribute__((weak)), with no implementation in the translation unit. This then then either evaluates to NULL or an actual implementation, if there existed another, non-weak definition in another object file at link time.

Secondly, multiple translation units may have function definitions that are marked with the weak attribute. You can have this in 0-N object files, and 0-1 object files containing a non-weak definition. If there's no non-weak definition, one of the weak definitions ends up picked, but if there is one, the non-weak one ends up used.

As all this is consumed via GNU style attributes (in MinGW environments), it shouldn't really matter in an MSVC context.

I recently worked on this to get the final details on this hooked up for COFF, so I'd be happy to have a look at any work touching this feature.

In Windows PE/COFF files, aliases typically just resolve to their target
symbol. For an example, see ⚙ D87403 [ms] [llvm-ml] Add support for "alias" directive.

For the cases where there already exists a symbol with a name that is unique in itself, just adding an alias directly to the target symbol sounds sensible in itself, but for cases when it isn't set up as an alias, but where the implementation itself is marked weak, the uniquifying symbol name is needed, to allow multiple objects to provide the same thing.

Consider these two examples in GAS assembly form:

         .globl uniquename
uniquename:
         ret

         .globl func
func:
         ret

         .weak aliasname
aliasname = func

This produces the following symbols, shown with llvm-objdump -t:

[ 6](sec 1)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 uniquename
[ 7](sec 1)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000001 func
[ 8](sec 0)(fl 0x00)(ty 0)(scl 69) (nx 1) 0x00000000 aliasname
AUX indx 10 srch 3 [pointing at .weak.aliasname.default.uniquename]
[10](sec 1)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000001 .weak.aliasname.default.uniquename

So here .weak.aliasname.default.uniquename is identical to func, and as func itself is non-weak, aliasname could just as well have pointed directly at func instead.

But for this case, the extra dance is necessary:

         .globl uniquename
uniquename:
         ret

         .weak func
         .globl func
func:
         ret

Producing:
[ 6](sec 1)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 uniquename
[ 7](sec 0)(fl 0x00)(ty 0)(scl 69) (nx 1) 0x00000000 func
AUX indx 9 srch 3
[ 9](sec 1)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000001 .weak.func.default.uniquename

Initially, the non-weak symbols were just named ".weak.func.default", but this caused clashes if multiple object files defined the same one. I tried fixing this in ⚙ D71711 [COFF] Make the autogenerated .weak.<name>.default symbols static by making the non-weak symbols that the weak ones point at static, but MSVC tools error out if you have a weak symbol pointing at a non-external symbol (as "weak" in COFF actually is "weak external"). Therefore I reverted that attempt and I later made ⚙ D75989 [COFF] Assign unique names to autogenerated .weak.<name>.default symbols that tries to make unique names for these symbols, to avoid clashes.

// Martin

Thanks, Martin!

My biggest question is around the behavior for alias-to-alias linkage. Using Microsoft tools (ml64.exe), if you define an external symbol t2, alias t4 to t2, and alias t7 to t4, you get exactly what you asked for:
[ 8](sec 1)(fl 0x00)(ty 20)(scl 2) (nx 0) 0x00000001 t2
[ 9](sec 0)(fl 0x00)(ty 0)(scl 69) (nx 1) 0x00000001 t4
AUX indx 8 srch 3
[11](sec 0)(fl 0x00)(ty 0)(scl 69) (nx 1) 0x00000001 t7
AUX indx 9 srch 3

Using LLVM, we instead get a second weak default-null reference pointing directly to t2, rather than to t4:
[ 3](sec 1)(fl 0x00)(ty 20)(scl 2) (nx 0) 0x00000001 t2

[ 7](sec 0)(fl 0x00)(ty 0)(scl 69) (nx 1) 0x00000000 t4
AUX indx 9 srch 3
[ 9](sec 1)(fl 0x00)(ty 0)(scl 69) (nx 0) 0x00000001 .weak.t4.default.t1

[17](sec 0)(fl 0x00)(ty 0)(scl 69) (nx 1) 0x00000000 t7
AUX indx 19 srch 3
[19](sec 1)(fl 0x00)(ty 0)(scl 69) (nx 0) 0x00000001 .weak.t7.default.t1

Due to our creation of “.weak” intermediates duplicating the current resolution of the aliasee, I think this can result in a different resolution for t7 than would happen in the Microsoft tools case? (Say, in a context where t4 has a strong definition.)

Maybe we should eliminate the “.weak” intermediates if the reference’s target is already an external symbol? They seem unnecessary for that case.

Thanks,

  • Eric

Hi Eric,