We’ve been considering changing the naming scheme for promoted local functions in ThinLTO. Currently we just prepend the file name, but that isn’t really sound for a number of reasons (e.g. you can have the same file name in different directories). The alternative we’ve been thinking about is to use the hash of all external names in the module, as that is guaranteed to be unique within a linkage unit (otherwise the linker would complain).
We currently (intentionally, I believe) use the same naming scheme for promoting local functions as we do for PGO, so we might need to change both. Do you see any back compat concerns with changing the naming scheme? I guess there are various things we can do to try to ensure back compat, but I wanted to get an idea of what the requirements are.
We've been considering changing the naming scheme for promoted local
functions in ThinLTO. Currently we just prepend the file name, but that
isn't really sound for a number of reasons (e.g. you can have the same file
name in different directories). The alternative we've been thinking about
is to use the hash of all external names in the module, as that is
guaranteed to be unique within a linkage unit (otherwise the linker would
complain).
We currently (intentionally, I believe) use the same naming scheme for
promoting local functions as we do for PGO,
No, we don't use this naming scheme for ThinLTO promotion. It is only used
for computation of the MD5 hash used in the function index (so that when we
want to import a function referenced by indirect call profile info which
uses this MD5 name we can find its summary).
The promotion name is based off a unique module identifier assigned at
Thin-link time (when the combined index is generated and all bitcode files
are seen).
Thanks,
Teresa
so we might need to change both. Do you see any back compat concerns with
I see. I suppose that if we did form promotion names using external name
hashes, we could soundly compile parts of the object file to native code at
compile time, if we could somehow determine ahead of time that such
compilation would be safe. I'm working on a proposal along those lines that
I hope to share soon.
Teresa Johnson via llvm-dev <llvm-dev@lists.llvm.org> writes:
Hi David,
We've been considering changing the naming scheme for promoted local
functions in ThinLTO. Currently we just prepend the file name, but that
isn't really sound for a number of reasons (e.g. you can have the same file
name in different directories). The alternative we've been thinking about
is to use the hash of all external names in the module, as that is
guaranteed to be unique within a linkage unit (otherwise the linker would
complain).
We currently (intentionally, I believe) use the same naming scheme for
promoting local functions as we do for PGO,
No, we don't use this naming scheme for ThinLTO promotion. It is only used
for computation of the MD5 hash used in the function index (so that when we
want to import a function referenced by indirect call profile info which
uses this MD5 name we can find its summary).
Oh good, I was worried for a second. The PGO renaming approach isn't
very robust at all and will hopefully be replaced with something less
fragile eventually.
Teresa Johnson via llvm-dev <llvm-dev@lists.llvm.org> writes:
>
>> Hi David,
>>
>> We've been considering changing the naming scheme for promoted local
>> functions in ThinLTO. Currently we just prepend the file name, but that
>> isn't really sound for a number of reasons (e.g. you can have the same
file
>> name in different directories). The alternative we've been thinking
about
>> is to use the hash of all external names in the module, as that is
>> guaranteed to be unique within a linkage unit (otherwise the linker
would
>> complain).
>>
>> We currently (intentionally, I believe) use the same naming scheme for
>> promoting local functions as we do for PGO,
>>
>
> No, we don't use this naming scheme for ThinLTO promotion. It is only
used
> for computation of the MD5 hash used in the function index (so that when
we
> want to import a function referenced by indirect call profile info which
> uses this MD5 name we can find its summary).
Oh good, I was worried for a second. The PGO renaming approach isn't
very robust at all and will hopefully be replaced with something less
fragile eventually.
Right -- the only reason we still keep the current naming scheme in PGO is
due to backward compatibility concerns. We have also recently introduced a
mechanism in PGO that allows us to make the change without breaking the
compatibility.
Ideally we should have a shared API to create unique global identifier for
static functions.
We've been considering changing the naming scheme for promoted local
functions in ThinLTO. Currently we just prepend the file name, but that
isn't really sound for a number of reasons (e.g. you can have the same file
name in different directories). The alternative we've been thinking about
is to use the hash of all external names in the module, as that is
guaranteed to be unique within a linkage unit (otherwise the linker would
complain).
We currently (intentionally, I believe) use the same naming scheme for
promoting local functions as we do for PGO,
No, we don't use this naming scheme for ThinLTO promotion. It is only
used for computation of the MD5 hash used in the function index (so that
when we want to import a function referenced by indirect call profile info
which uses this MD5 name we can find its summary).
The promotion name is based off a unique module identifier assigned at
Thin-link time (when the combined index is generated and all bitcode files
are seen).
I see. I suppose that if we did form promotion names using external name
hashes, we could soundly compile parts of the object file to native code at
compile time, if we could somehow determine ahead of time that such
compilation would be safe. I'm working on a proposal along those lines that
I hope to share soon.
We've been considering changing the naming scheme for promoted local
functions in ThinLTO. Currently we just prepend the file name, but that
isn't really sound for a number of reasons (e.g. you can have the same file
name in different directories). The alternative we've been thinking about
is to use the hash of all external names in the module, as that is
guaranteed to be unique within a linkage unit (otherwise the linker would
complain).
We currently (intentionally, I believe) use the same naming scheme for
promoting local functions as we do for PGO,
No, we don't use this naming scheme for ThinLTO promotion. It is only
used for computation of the MD5 hash used in the function index (so that
when we want to import a function referenced by indirect call profile info
which uses this MD5 name we can find its summary).
The promotion name is based off a unique module identifier assigned at
Thin-link time (when the combined index is generated and all bitcode files
are seen).
I see. I suppose that if we did form promotion names using external name
hashes, we could soundly compile parts of the object file to native code at
compile time, if we could somehow determine ahead of time that such
compilation would be safe. I'm working on a proposal along those lines that
I hope to share soon.
I mentioned this verbally to Peter, but for the sake of discussion, using
this proposed renaming scheme also has the advantage that the renaming does
not depend on the link order. The current ThinLTO renaming depends on the
order the bitcode files are seen on the link line. That makes it harder to
share the same resulting native object file between links into diffferent
binaries, since the ThinLTO-link-assigned module ID will depend on the link
ordering. Of course, you would also need to determine that the import set
is the same to reuse an existing native object file from another build, but
this helps reduce the barrier.
Note that this is orthogonal to whether the new renaming scheme is done in
the ThinLTO backends (current) or moved to the compile step (part of what
Peter is proposing for early native code generation).
Doing this new renaming in the ThinLTO backend (rather than moving it up to
the compile step) just requires recording this hash value in the combined
index's module symbol table (where the module path and id are currently
held).
It is not clear to me why you need the hash value in the combined index? Can’t you compute it from the summary? (by changing the definition from the "hash of public function names” to “hash of public function GUID”).
I was working on changing the naming to use the “module hash” itself as a suffix, but this hash (from global function name) seems as good for this purpose.
We've been considering changing the naming scheme for promoted local
functions in ThinLTO. Currently we just prepend the file name, but that
isn't really sound for a number of reasons (e.g. you can have the same file
name in different directories). The alternative we've been thinking about
is to use the hash of all external names in the module, as that is
guaranteed to be unique within a linkage unit (otherwise the linker would
complain).
We currently (intentionally, I believe) use the same naming scheme for
promoting local functions as we do for PGO,
No, we don't use this naming scheme for ThinLTO promotion. It is only
used for computation of the MD5 hash used in the function index (so that
when we want to import a function referenced by indirect call profile info
which uses this MD5 name we can find its summary).
The promotion name is based off a unique module identifier assigned at
Thin-link time (when the combined index is generated and all bitcode files
are seen).
I see. I suppose that if we did form promotion names using external name
hashes, we could soundly compile parts of the object file to native code at
compile time, if we could somehow determine ahead of time that such
compilation would be safe. I'm working on a proposal along those lines that
I hope to share soon.
I mentioned this verbally to Peter, but for the sake of discussion, using
this proposed renaming scheme also has the advantage that the renaming does
not depend on the link order. The current ThinLTO renaming depends on the
order the bitcode files are seen on the link line. That makes it harder to
share the same resulting native object file between links into diffferent
binaries, since the ThinLTO-link-assigned module ID will depend on the link
ordering. Of course, you would also need to determine that the import set
is the same to reuse an existing native object file from another build, but
this helps reduce the barrier.
Note that this is orthogonal to whether the new renaming scheme is done in
the ThinLTO backends (current) or moved to the compile step (part of what
Peter is proposing for early native code generation).
Doing this new renaming in the ThinLTO backend (rather than moving it up
to the compile step) just requires recording this hash value in the
combined index's module symbol table (where the module path and id are
currently held).
It is not clear to me why you need the hash value in the combined index?
Can’t you compute it from the summary? (by changing the definition from the
"hash of public function names” to “hash of public function GUID”).
Only if the PGO naming scheme is changed accordingly. Otherwise the summary
entries need to continue using the global ID based hash (want to match PGO
for indirect call importing).