RFC: LLVM Assembly format for ThinLTO Summary

Hi Mehdi, thanks for the comments, responses and a tweaked proposal below. Teresa

I totally agree with Mehdi’s concern. It is probably much easier to treat module summary as analysis result, rather than part of the module. Making it part of IR might be overkill just to fix the debugging and readability of the summary.

I would +1 for making this an analysis pass. How about a ModuleSummaryAnalysis that knows how to serialize its result into bitcode and YAML format? We can also add a module flag to indicate if the module should contain module summary or not.

  • For thinLTO compilation, bitcode writer runs the analysis pass if the module flag is set, and emit module summary into bitcode.
  • When reading bitcode with module summary, the summary is parsed into ModuleSummaryAnalysis result.

Note the reason we don’t do this now is that we don’t ever need to use a the summary at the same time as when we use the IR. We only read the summary for the thin link (no IR involved), or for distributed ThinLTO build backends (but only the combined index is read, the summary attached to the module is not read as it is not needed at all). Which is why I suggested the simplification in my proposal above of only reading the summary from assembly where we currently read the bitcode summary (which is not when we normally read the IR).

Also, note as I mentioned above that the default path may split the IR into regular LTO and ThinLTO partitions, which is why we stopped invoking it as analysis pass on the default path, and instead invoke the index builder directly for each partition. Figuring out how to split a single module summary analysis result into two parts is not something I think we should block this on.

  • If there are any transform invalidates the analysis, module summary will be recomputed automatically when writing bitcode, otherwise, it will serialize out the same result as input (probably with some auto upgrade).
  • For debugging, ModuleSummaryAnalysis can be serialized out to YAML
  • For testing, ModuleSummaryAnalysis result can be created from YAML

I suggested YAML but it can be any other format, including LLVM assembly. For readability, I would prefer YAML.

That is similar to the original proposal from last year, but there was not agreement from other upstream maintainers to dump as something other than assembly that could be serialized in.

That’s why I’m proposing the version above - serialize in as assembly exactly where we consume the bitcode version of the module summary index today. The advantage is that it doesn’t require analysis pass work and figuring out how to make the analysis pass work with split LTO partitions. At the same time, it doesn’t preclude doing that in the future either. I think my proposal enables testing of the ThinLTO pipeline via textual assembly, which is one of the main goals of being able to serialize it back in.

Sounds good. I am totally fine with using LLVM assembly as textual format and I have already stated my preference of few design choice in the preview email. I don’t have anything against the design if you treat the summary as something separated from IR even though they are in the same assembly file.

Steven

Hi Teresa,

I have re-read your proposal, and I'm not getting how you plan to
represent combined summaries with this. Unless I'm missing something, there
doesn't seem to be a way to write out summaries that is independent of the
global values that they relate to. Is that something that you plan to
address later?

I envisioned that the combined index assembly files would only contain
GUIDs, not GV names, just as we do in the combined index bitcode files.
Does that answer your question?

Okay, I get it now. For some reason I got the impression that the top-level
entities in your proposal were the global values and not the summaries.

Peter

Hi Mehdi, thanks for the comments, responses and a tweaked proposal
below. Teresa

Hi,

My main concern is this one:

> Currently, I am emitting the summary entries at the end, after the
metadata nodes. Note that the ModuleSummaryIndex is not currently
referenced from the Module, and isn’t currently created when parsing the
Module IR bitcode (there is a separate derived class for reading the
ModuleSummaryIndex from bitcode). This is because they are not currently
used at the same time. However, in the future there is no reason why we
couldn’t tag the global values in the Module’s LLVM assembly with the
corresponding summary entry if the ModuleSummaryIndex is available when
printing the Module in the assembly writer. I.e. we could do the following
for “main” from the above example when printing the IR definition (note the
“^3” at the end):

I believe the reason that the ModuleSummaryIndex is not attached to the
Module is that it is fundamentally not a piece of IR, but it is
conceptually really an Analysis result.
Usually other analyses don't serialize their result, we happen to
serialize this one for an optimization purpose (reloading it and making the
thin-link faster).

True. My understanding is that the push for having it serialized via
assembly is due to the fact that it is emitted into the bitcode. I know
there is disagreement on this reasoning, I am hoping to have a proposal
that is acceptable to everyone. =)

I totally agree with Mehdi’s concern. It is probably much easier to treat
module summary as analysis result, rather than part of the module. Making
it part of IR might be overkill just to fix the debugging and readability
of the summary.

The fundamental problem is that an analysis result has to be able to be
invalidated with IR changes, attaching this directly to the module wouldn't
achieve this. The risk is that when the IR and the summary get out-of-sync
(`clang -O2 my_module_with_summaries.ll -emit-llvm -o my_optimized
module_with_summaries.ll`) the summaries would be badly wrong.

Have you looked into what it'd take to make it a "real" analysis in the
pass manager?

Thanks for raising this issue specifically, I hadn't addressed it in my
proposal and it is a big one. I am not proposing that we attempt to
maintain the summary through optimization passes, and definitely don't
think we should do that. IMO deserializing it should be for testing the
thin link and the combined summaries in the backends only. To that end, I
have an idea (below some background first).

Note that in some cases the module summary analysis is an analysis pass.
I.e. when invoked by "opt -module-summary=". However, some time ago when
Peter added the support for splitting the bitcode (for CFI purposes) and
therefore needed to generate a summary in each partition (Thin and
Regular), he added the ThinLTOBitcodeWriterPass, which invokes the module
summary builder directly (twice). This writer is what gets invoked now when
building via "clang -flto=thin", and with "opt -thinlto-bc". So there it is
not invoked/maintained as an analysis pass/result. It would be tricky to
figure out how to even split rather than recompute the module summary index
in that case. Even in the case where we are still invoking as an analysis
pass (opt -module-summary), we would need to figure out how to read in the
module summary to use as the analysis result when available (so that it
could be invalidated and recomputed when stale).

Rather than add this plumbing, and just have it discarded if opt does any
optimization, I think we should focus at least for the time being on
supporting reading the summary from assembly exactly where we currently
read in the summary from bitcode:
1) For the thin link (e.g. tools such as llvm-lto2 or llvm-lto, which
currently have to be preceded by "opt -module-summary/-thinlto-bc" to
generate an index, but could just build it from assembly instead).
2) For the LTO backends (e.g. tools such as llvm-lto which can consume a
combined index and invoke the backends, or "clang -fthinlto-index=" for
distributed ThinLTO backend testing), where we could build the combined
summary index from assembly instead.

This greatly simplifies the reading side, as there are no optimizations
performed on the IR after the index is read in these cases that would
require invalidation. It also simplifies adding the parsing support, since
it gets invoked exactly where we expect to build an index currently (i.e.
since we don't currently build or store the ModuleSummaryIndex when parsing
the Module from bitcode). It doesn't preclude someone from figuring out how
to compute the module summary analysis result from the assembly, and
invalidating it after optimization, when reading the Module IR via 'opt' in
the future.

Does this seem like a reasonable proposal to everyone?

I would +1 for making this an analysis pass. How about a
ModuleSummaryAnalysis that knows how to serialize its result into bitcode
and YAML format? We can also add a module flag to indicate if the module
should contain module summary or not.

* For thinLTO compilation, bitcode writer runs the analysis pass if the
module flag is set, and emit module summary into bitcode.

* When reading bitcode with module summary, the summary is parsed into

ModuleSummaryAnalysis result.

Note the reason we don't do this now is that we don't ever need to use a
the summary at the same time as when we use the IR. We only read the
summary for the thin link (no IR involved), or for distributed ThinLTO
build backends (but only the combined index is read, the summary attached
to the module is not read as it is not needed at all). Which is why I
suggested the simplification in my proposal above of only reading the
summary from assembly where we currently read the bitcode summary (which is
not when we normally read the IR).

Also, note as I mentioned above that the default path may split the IR
into regular LTO and ThinLTO partitions, which is why we stopped invoking
it as analysis pass on the default path, and instead invoke the index
builder directly for each partition. Figuring out how to split a single
module summary analysis result into two parts is not something I think we
should block this on.

* If there are any transform invalidates the analysis, module summary will

be recomputed automatically when writing bitcode, otherwise, it will
serialize out the same result as input (probably with some auto upgrade).
* For debugging, ModuleSummaryAnalysis can be serialized out to YAML
* For testing, ModuleSummaryAnalysis result can be created from YAML

I suggested YAML but it can be any other format, including LLVM assembly.
For readability, I would prefer YAML.

That is similar to the original proposal from last year, but there was not
agreement from other upstream maintainers to dump as something other than
assembly that could be serialized in.

That's why I'm proposing the version above - serialize in as assembly
exactly where we consume the bitcode version of the module summary index
today. The advantage is that it doesn't require analysis pass work and
figuring out how to make the analysis pass work with split LTO partitions.
At the same time, it doesn't preclude doing that in the future either. I
think my proposal enables testing of the ThinLTO pipeline via textual
assembly, which is one of the main goals of being able to serialize it back
in.

Sounds good. I am totally fine with using LLVM assembly as textual format
and I have already stated my preference of few design choice in the preview
email. I don’t have anything against the design if you treat the summary as
something separated from IR even though they are in the same assembly file.

Ok great, thanks!
Teresa

Hi Teresa,

I have re-read your proposal, and I'm not getting how you plan to
represent combined summaries with this. Unless I'm missing something, there
doesn't seem to be a way to write out summaries that is independent of the
global values that they relate to. Is that something that you plan to
address later?

I envisioned that the combined index assembly files would only contain
GUIDs, not GV names, just as we do in the combined index bitcode files.
Does that answer your question?

Okay, I get it now. For some reason I got the impression that the
top-level entities in your proposal were the global values and not the
summaries.

Ok great. Probably it was misleading since I used "gv:" as the tag, but
that was in reference to the GlobalValueSummary structure name.

Thanks,
Teresa

That feels a bit surprising if “.ll → llvm-as → .bc → <some other tool - opt, etc>” is different from “.ll → <opt, etc>”. Is that what we’re talking about here? Any chance that can be avoided & feeding a .ll file works (in the sense of does the same thing/tests the same behavior) in all the same places that feeding a .bc file does? (as is the case with non-summary-based IR, to the best of my knowledge)

Hi Teresa,

I have re-read your proposal, and I'm not getting how you plan to
represent combined summaries with this. Unless I'm missing something, there
doesn't seem to be a way to write out summaries that is independent of the
global values that they relate to. Is that something that you plan to
address later?

I envisioned that the combined index assembly files would only contain
GUIDs, not GV names, just as we do in the combined index bitcode files.
Does that answer your question?

Okay, I get it now. For some reason I got the impression that the
top-level entities in your proposal were the global values and not the
summaries.

Ok great. Probably it was misleading since I used "gv:" as the tag, but
that was in reference to the GlobalValueSummary structure name.

I think it was more that I saw this:

define dso_local i32 @main() #0 !dbg !17 ^3

and focused on the ^3 at the end, but apparently missed the part about
these notations being not strictly required.

I wonder whether the ^3 should be a comment instead, to avoid this kind of
confusion.

Peter

Le mar. 1 mai 2018 à 16:50, Teresa Johnson <tejohnson@google.com> a
écrit :

Hi Mehdi, thanks for the comments, responses and a tweaked proposal
below. Teresa

Hi,

My main concern is this one:

> Currently, I am emitting the summary entries at the end, after the
metadata nodes. Note that the ModuleSummaryIndex is not currently
referenced from the Module, and isn’t currently created when parsing the
Module IR bitcode (there is a separate derived class for reading the
ModuleSummaryIndex from bitcode). This is because they are not currently
used at the same time. However, in the future there is no reason why we
couldn’t tag the global values in the Module’s LLVM assembly with the
corresponding summary entry if the ModuleSummaryIndex is available when
printing the Module in the assembly writer. I.e. we could do the following
for “main” from the above example when printing the IR definition (note the
“^3” at the end):

I believe the reason that the ModuleSummaryIndex is not attached to
the Module is that it is fundamentally not a piece of IR, but it is
conceptually really an Analysis result.
Usually other analyses don't serialize their result, we happen to
serialize this one for an optimization purpose (reloading it and making the
thin-link faster).

True. My understanding is that the push for having it serialized via
assembly is due to the fact that it is emitted into the bitcode. I know
there is disagreement on this reasoning, I am hoping to have a proposal
that is acceptable to everyone. =)

The fundamental problem is that an analysis result has to be able to
be invalidated with IR changes, attaching this directly to the module
wouldn't achieve this. The risk is that when the IR and the summary get
out-of-sync (`clang -O2 my_module_with_summaries.ll -emit-llvm -o
my_optimized module_with_summaries.ll`) the summaries would be badly wrong.

Have you looked into what it'd take to make it a "real" analysis in
the pass manager?

Thanks for raising this issue specifically, I hadn't addressed it in my
proposal and it is a big one. I am not proposing that we attempt to
maintain the summary through optimization passes, and definitely don't
think we should do that. IMO deserializing it should be for testing the
thin link and the combined summaries in the backends only. To that end, I
have an idea (below some background first).

Note that in some cases the module summary analysis is an analysis
pass. I.e. when invoked by "opt -module-summary=". However, some time ago
when Peter added the support for splitting the bitcode (for CFI purposes)
and therefore needed to generate a summary in each partition (Thin and
Regular), he added the ThinLTOBitcodeWriterPass, which invokes the module
summary builder directly (twice). This writer is what gets invoked now when
building via "clang -flto=thin", and with "opt -thinlto-bc". So there it is
not invoked/maintained as an analysis pass/result. It would be tricky to
figure out how to even split rather than recompute the module summary index
in that case. Even in the case where we are still invoking as an analysis
pass (opt -module-summary), we would need to figure out how to read in the
module summary to use as the analysis result when available (so that it
could be invalidated and recomputed when stale).

Rather than add this plumbing, and just have it discarded if opt does
any optimization, I think we should focus at least for the time being on
supporting reading the summary from assembly exactly where we currently
read in the summary from bitcode:
1) For the thin link (e.g. tools such as llvm-lto2 or llvm-lto, which
currently have to be preceded by "opt -module-summary/-thinlto-bc" to
generate an index, but could just build it from assembly instead).
2) For the LTO backends (e.g. tools such as llvm-lto which can consume
a combined index and invoke the backends, or "clang -fthinlto-index=" for
distributed ThinLTO backend testing), where we could build the combined
summary index from assembly instead.

This greatly simplifies the reading side, as there are no optimizations
performed on the IR after the index is read in these cases that would
require invalidation. It also simplifies adding the parsing support, since
it gets invoked exactly where we expect to build an index currently (i.e.
since we don't currently build or store the ModuleSummaryIndex when parsing
the Module from bitcode). It doesn't preclude someone from figuring out how
to compute the module summary analysis result from the assembly, and
invalidating it after optimization, when reading the Module IR via 'opt' in
the future.

Does this seem like a reasonable proposal to everyone?

Sounds good to me.

That would make .ll files quite convenient during debugging I think? We
could disassemble, manually change summaries, and re-assemble a bitcode
file before running the (thin-)link again.

Yes, that seems more reasonable than what I thought you had in mind. If
the only consumer of this information is llvm-as, then the purpose of the
asm summary format is just to provide a way to create a .bc file for
testing purposes, which is certainly a useful capability.

That feels a bit surprising if ".ll -> llvm-as -> .bc -> <some other tool
- opt, etc>" is different from ".ll -> <opt, etc>". Is that what we're
talking about here? Any chance that can be avoided & feeding a .ll file
works (in the sense of does the same thing/tests the same behavior) in all
the same places that feeding a .bc file does? (as is the case with
non-summary-based IR, to the best of my knowledge)

I may be mistaken, but I don't think we have a lot of tools that can read
both .ll and .bc and end up using the summary if it is a .bc file. LTO
can't read .ll, for example. The only one that I can think of is clang and
presumably we could make that use whichever API we would use in llvm-as for
reading the summary from .ll. So the behaviour of most tools would be that
".ll -> llvm-as -> .bc -> <some tool>" vs ".ll -> <some tool>" would end up
being the same in both cases: the summary gets discarded.

Peter

Re-sending with trimmed quotation.

Oh, I didn’t know that - why doesn’t it read .ll? That seems like an oversight - most/all the other LLVM tools treat .ll and .bc pretty equally, don’t they?

Re-sending with trimmed quotation.

Le mar. 1 mai 2018 à 16:50, Teresa Johnson <tejohnson@google.com> a
écrit :

Hi Mehdi, thanks for the comments, responses and a tweaked proposal
below. Teresa

Hi,

My main concern is this one:

> Currently, I am emitting the summary entries at the end, after the
metadata nodes. Note that the ModuleSummaryIndex is not currently
referenced from the Module, and isn’t currently created when parsing the
Module IR bitcode (there is a separate derived class for reading the
ModuleSummaryIndex from bitcode). This is because they are not currently
used at the same time. However, in the future there is no reason why we
couldn’t tag the global values in the Module’s LLVM assembly with the
corresponding summary entry if the ModuleSummaryIndex is available when
printing the Module in the assembly writer. I.e. we could do the following
for “main” from the above example when printing the IR definition (note the
“^3” at the end):

I believe the reason that the ModuleSummaryIndex is not attached to
the Module is that it is fundamentally not a piece of IR, but it is
conceptually really an Analysis result.
Usually other analyses don't serialize their result, we happen to
serialize this one for an optimization purpose (reloading it and making the
thin-link faster).

True. My understanding is that the push for having it serialized via
assembly is due to the fact that it is emitted into the bitcode. I know
there is disagreement on this reasoning, I am hoping to have a proposal
that is acceptable to everyone. =)

The fundamental problem is that an analysis result has to be able to
be invalidated with IR changes, attaching this directly to the module
wouldn't achieve this. The risk is that when the IR and the summary get
out-of-sync (`clang -O2 my_module_with_summaries.ll -emit-llvm -o
my_optimized module_with_summaries.ll`) the summaries would be badly wrong.

Have you looked into what it'd take to make it a "real" analysis in
the pass manager?

Thanks for raising this issue specifically, I hadn't addressed it in
my proposal and it is a big one. I am not proposing that we attempt to
maintain the summary through optimization passes, and definitely don't
think we should do that. IMO deserializing it should be for testing the
thin link and the combined summaries in the backends only. To that end, I
have an idea (below some background first).

Note that in some cases the module summary analysis is an analysis
pass. I.e. when invoked by "opt -module-summary=". However, some time ago
when Peter added the support for splitting the bitcode (for CFI purposes)
and therefore needed to generate a summary in each partition (Thin and
Regular), he added the ThinLTOBitcodeWriterPass, which invokes the module
summary builder directly (twice). This writer is what gets invoked now when
building via "clang -flto=thin", and with "opt -thinlto-bc". So there it is
not invoked/maintained as an analysis pass/result. It would be tricky to
figure out how to even split rather than recompute the module summary index
in that case. Even in the case where we are still invoking as an analysis
pass (opt -module-summary), we would need to figure out how to read in the
module summary to use as the analysis result when available (so that it
could be invalidated and recomputed when stale).

Rather than add this plumbing, and just have it discarded if opt does
any optimization, I think we should focus at least for the time being on
supporting reading the summary from assembly exactly where we currently
read in the summary from bitcode:
1) For the thin link (e.g. tools such as llvm-lto2 or llvm-lto, which
currently have to be preceded by "opt -module-summary/-thinlto-bc" to
generate an index, but could just build it from assembly instead).
2) For the LTO backends (e.g. tools such as llvm-lto which can
consume a combined index and invoke the backends, or "clang
-fthinlto-index=" for distributed ThinLTO backend testing), where we could
build the combined summary index from assembly instead.

This greatly simplifies the reading side, as there are no
optimizations performed on the IR after the index is read in these cases
that would require invalidation. It also simplifies adding the parsing
support, since it gets invoked exactly where we expect to build an index
currently (i.e. since we don't currently build or store the
ModuleSummaryIndex when parsing the Module from bitcode). It doesn't
preclude someone from figuring out how to compute the module summary
analysis result from the assembly, and invalidating it after optimization,
when reading the Module IR via 'opt' in the future.

Does this seem like a reasonable proposal to everyone?

Sounds good to me.

That would make .ll files quite convenient during debugging I think?
We could disassemble, manually change summaries, and re-assemble a bitcode
file before running the (thin-)link again.

Yes, that seems more reasonable than what I thought you had in mind. If
the only consumer of this information is llvm-as, then the purpose of the
asm summary format is just to provide a way to create a .bc file for
testing purposes, which is certainly a useful capability.

That feels a bit surprising if ".ll -> llvm-as -> .bc -> <some other
tool - opt, etc>" is different from ".ll -> <opt, etc>". Is that what we're
talking about here? Any chance that can be avoided & feeding a .ll file
works (in the sense of does the same thing/tests the same behavior) in all
the same places that feeding a .bc file does? (as is the case with
non-summary-based IR, to the best of my knowledge)

I may be mistaken, but I don't think we have a lot of tools that can read
both .ll and .bc and end up using the summary if it is a .bc file. LTO
can't read .ll, for example. The only one that I can think of is clang and
presumably we could make that use whichever API we would use in llvm-as for
reading the summary from .ll.

I was thinking it would be nice to enable tools like llvm-lto2 to take a
.ll file for reading the summary, or is that difficult?

So the behaviour of most tools would be that ".ll -> llvm-as -> .bc ->

<some tool>" vs ".ll -> <some tool>" would end up being the same in both
cases: the summary gets discarded.

I assume you are referring to tools that don't currently need the summary?
I.e. if <some tool> was llvm-lto2 we would not want the summary discarded.
But 'opt' would under the current proposal, as we don't normally read a
summary there and don't want to try to keep the summary in sync with the IR
for all the reasons mentioned in this thread.

Teresa

Le mar. 1 mai 2018 à 16:50, Teresa Johnson <tejohnson@google.com> a
écrit :

Hi Mehdi, thanks for the comments, responses and a tweaked proposal
below. Teresa

Hi,

My main concern is this one:

> Currently, I am emitting the summary entries at the end, after the
metadata nodes. Note that the ModuleSummaryIndex is not currently
referenced from the Module, and isn’t currently created when parsing the
Module IR bitcode (there is a separate derived class for reading the
ModuleSummaryIndex from bitcode). This is because they are not currently
used at the same time. However, in the future there is no reason why we
couldn’t tag the global values in the Module’s LLVM assembly with the
corresponding summary entry if the ModuleSummaryIndex is available when
printing the Module in the assembly writer. I.e. we could do the following
for “main” from the above example when printing the IR definition (note the
“^3” at the end):

I believe the reason that the ModuleSummaryIndex is not attached to
the Module is that it is fundamentally not a piece of IR, but it is
conceptually really an Analysis result.
Usually other analyses don't serialize their result, we happen to
serialize this one for an optimization purpose (reloading it and making the
thin-link faster).

True. My understanding is that the push for having it serialized via
assembly is due to the fact that it is emitted into the bitcode. I know
there is disagreement on this reasoning, I am hoping to have a proposal
that is acceptable to everyone. =)

The fundamental problem is that an analysis result has to be able to
be invalidated with IR changes, attaching this directly to the module
wouldn't achieve this. The risk is that when the IR and the summary get
out-of-sync (`clang -O2 my_module_with_summaries.ll -emit-llvm -o
my_optimized module_with_summaries.ll`) the summaries would be badly wrong.

Have you looked into what it'd take to make it a "real" analysis in
the pass manager?

Thanks for raising this issue specifically, I hadn't addressed it in
my proposal and it is a big one. I am not proposing that we attempt to
maintain the summary through optimization passes, and definitely don't
think we should do that. IMO deserializing it should be for testing the
thin link and the combined summaries in the backends only. To that end, I
have an idea (below some background first).

Note that in some cases the module summary analysis is an analysis
pass. I.e. when invoked by "opt -module-summary=". However, some time ago
when Peter added the support for splitting the bitcode (for CFI purposes)
and therefore needed to generate a summary in each partition (Thin and
Regular), he added the ThinLTOBitcodeWriterPass, which invokes the module
summary builder directly (twice). This writer is what gets invoked now when
building via "clang -flto=thin", and with "opt -thinlto-bc". So there it is
not invoked/maintained as an analysis pass/result. It would be tricky to
figure out how to even split rather than recompute the module summary index
in that case. Even in the case where we are still invoking as an analysis
pass (opt -module-summary), we would need to figure out how to read in the
module summary to use as the analysis result when available (so that it
could be invalidated and recomputed when stale).

Rather than add this plumbing, and just have it discarded if opt does
any optimization, I think we should focus at least for the time being on
supporting reading the summary from assembly exactly where we currently
read in the summary from bitcode:
1) For the thin link (e.g. tools such as llvm-lto2 or llvm-lto, which
currently have to be preceded by "opt -module-summary/-thinlto-bc" to
generate an index, but could just build it from assembly instead).
2) For the LTO backends (e.g. tools such as llvm-lto which can
consume a combined index and invoke the backends, or "clang
-fthinlto-index=" for distributed ThinLTO backend testing), where we could
build the combined summary index from assembly instead.

This greatly simplifies the reading side, as there are no
optimizations performed on the IR after the index is read in these cases
that would require invalidation. It also simplifies adding the parsing
support, since it gets invoked exactly where we expect to build an index
currently (i.e. since we don't currently build or store the
ModuleSummaryIndex when parsing the Module from bitcode). It doesn't
preclude someone from figuring out how to compute the module summary
analysis result from the assembly, and invalidating it after optimization,
when reading the Module IR via 'opt' in the future.

Does this seem like a reasonable proposal to everyone?

Sounds good to me.

That would make .ll files quite convenient during debugging I think?
We could disassemble, manually change summaries, and re-assemble a bitcode
file before running the (thin-)link again.

Yes, that seems more reasonable than what I thought you had in mind. If
the only consumer of this information is llvm-as, then the purpose of the
asm summary format is just to provide a way to create a .bc file for
testing purposes, which is certainly a useful capability.

That feels a bit surprising if ".ll -> llvm-as -> .bc -> <some other
tool - opt, etc>" is different from ".ll -> <opt, etc>". Is that what we're
talking about here? Any chance that can be avoided & feeding a .ll file
works (in the sense of does the same thing/tests the same behavior) in all
the same places that feeding a .bc file does? (as is the case with
non-summary-based IR, to the best of my knowledge)

I may be mistaken, but I don't think we have a lot of tools that can read
both .ll and .bc and end up using the summary if it is a .bc file. LTO
can't read .ll, for example.

Oh, I didn't know that - why doesn't it read .ll? That seems like an
oversight - most/all the other LLVM tools treat .ll and .bc pretty equally,
don't they?

Yes, the developer-facing tools do (clang being one of them, as it has some
developer-facing features), but linkers (the LTO API consumers) are
user-facing, and have no need to consume .ll files. Getting them to read
.ll has other complications as well. For example, Unix linkers will
interpret textual input files as linker scripts, so there would need to be
some mechanism to distinguish .ll files from linker scripts.

llvm-lto2 (the LTO API test driver) is a developer-facing tool, and we
could conceivably make it read .ll files and pass them to the API as .bc
files.

Peter

Fair, that makes sense.

Yeah, I was mostly thinking of llvm-lto2 then, I guess. :slight_smile:

Re-sending with trimmed quotation.

Le mar. 1 mai 2018 à 16:50, Teresa Johnson <tejohnson@google.com> a
écrit :

Hi Mehdi, thanks for the comments, responses and a tweaked proposal
below. Teresa

Hi,

My main concern is this one:

> Currently, I am emitting the summary entries at the end, after
the metadata nodes. Note that the ModuleSummaryIndex is not currently
referenced from the Module, and isn’t currently created when parsing the
Module IR bitcode (there is a separate derived class for reading the
ModuleSummaryIndex from bitcode). This is because they are not currently
used at the same time. However, in the future there is no reason why we
couldn’t tag the global values in the Module’s LLVM assembly with the
corresponding summary entry if the ModuleSummaryIndex is available when
printing the Module in the assembly writer. I.e. we could do the following
for “main” from the above example when printing the IR definition (note the
“^3” at the end):

I believe the reason that the ModuleSummaryIndex is not attached to
the Module is that it is fundamentally not a piece of IR, but it is
conceptually really an Analysis result.
Usually other analyses don't serialize their result, we happen to
serialize this one for an optimization purpose (reloading it and making the
thin-link faster).

True. My understanding is that the push for having it serialized via
assembly is due to the fact that it is emitted into the bitcode. I know
there is disagreement on this reasoning, I am hoping to have a proposal
that is acceptable to everyone. =)

The fundamental problem is that an analysis result has to be able
to be invalidated with IR changes, attaching this directly to the module
wouldn't achieve this. The risk is that when the IR and the summary get
out-of-sync (`clang -O2 my_module_with_summaries.ll -emit-llvm -o
my_optimized module_with_summaries.ll`) the summaries would be badly wrong.

Have you looked into what it'd take to make it a "real" analysis in
the pass manager?

Thanks for raising this issue specifically, I hadn't addressed it in
my proposal and it is a big one. I am not proposing that we attempt to
maintain the summary through optimization passes, and definitely don't
think we should do that. IMO deserializing it should be for testing the
thin link and the combined summaries in the backends only. To that end, I
have an idea (below some background first).

Note that in some cases the module summary analysis is an analysis
pass. I.e. when invoked by "opt -module-summary=". However, some time ago
when Peter added the support for splitting the bitcode (for CFI purposes)
and therefore needed to generate a summary in each partition (Thin and
Regular), he added the ThinLTOBitcodeWriterPass, which invokes the module
summary builder directly (twice). This writer is what gets invoked now when
building via "clang -flto=thin", and with "opt -thinlto-bc". So there it is
not invoked/maintained as an analysis pass/result. It would be tricky to
figure out how to even split rather than recompute the module summary index
in that case. Even in the case where we are still invoking as an analysis
pass (opt -module-summary), we would need to figure out how to read in the
module summary to use as the analysis result when available (so that it
could be invalidated and recomputed when stale).

Rather than add this plumbing, and just have it discarded if opt
does any optimization, I think we should focus at least for the time being
on supporting reading the summary from assembly exactly where we currently
read in the summary from bitcode:
1) For the thin link (e.g. tools such as llvm-lto2 or llvm-lto,
which currently have to be preceded by "opt -module-summary/-thinlto-bc" to
generate an index, but could just build it from assembly instead).
2) For the LTO backends (e.g. tools such as llvm-lto which can
consume a combined index and invoke the backends, or "clang
-fthinlto-index=" for distributed ThinLTO backend testing), where we could
build the combined summary index from assembly instead.

This greatly simplifies the reading side, as there are no
optimizations performed on the IR after the index is read in these cases
that would require invalidation. It also simplifies adding the parsing
support, since it gets invoked exactly where we expect to build an index
currently (i.e. since we don't currently build or store the
ModuleSummaryIndex when parsing the Module from bitcode). It doesn't
preclude someone from figuring out how to compute the module summary
analysis result from the assembly, and invalidating it after optimization,
when reading the Module IR via 'opt' in the future.

Does this seem like a reasonable proposal to everyone?

Sounds good to me.

That would make .ll files quite convenient during debugging I think?
We could disassemble, manually change summaries, and re-assemble a bitcode
file before running the (thin-)link again.

Yes, that seems more reasonable than what I thought you had in mind.
If the only consumer of this information is llvm-as, then the purpose of
the asm summary format is just to provide a way to create a .bc file for
testing purposes, which is certainly a useful capability.

That feels a bit surprising if ".ll -> llvm-as -> .bc -> <some other
tool - opt, etc>" is different from ".ll -> <opt, etc>". Is that what we're
talking about here? Any chance that can be avoided & feeding a .ll file
works (in the sense of does the same thing/tests the same behavior) in all
the same places that feeding a .bc file does? (as is the case with
non-summary-based IR, to the best of my knowledge)

I may be mistaken, but I don't think we have a lot of tools that can read
both .ll and .bc and end up using the summary if it is a .bc file. LTO
can't read .ll, for example. The only one that I can think of is clang and
presumably we could make that use whichever API we would use in llvm-as for
reading the summary from .ll.

I was thinking it would be nice to enable tools like llvm-lto2 to take a
.ll file for reading the summary, or is that difficult?

It should be possible, see my reply to David. :slight_smile:

So the behaviour of most tools would be that ".ll -> llvm-as -> .bc ->

<some tool>" vs ".ll -> <some tool>" would end up being the same in both
cases: the summary gets discarded.

I assume you are referring to tools that don't currently need the summary?
I.e. if <some tool> was llvm-lto2 we would not want the summary discarded.
But 'opt' would under the current proposal, as we don't normally read a
summary there and don't want to try to keep the summary in sync with the IR
for all the reasons mentioned in this thread.

Yes, I'm talking about opt and other tools that don't currently read the
summary.

Peter

Yeah, no worries about the linker - though I guess it could be convenient if it did some lookahead & could detect a .ll file as being different from a linker script - but that’s a “nice to have”/heroics, not something I’d expect.

Right - this was more what I had in mind. Basically I’d find it weird if I came across an LLVM test case that first ran a .ll file through llvm-as, then ran it on the tool (llvm-lto2) in question. I’d likely try to remove the indirection & be a bit annoyed if that failed.

  • Dave

There is still a discrepancy in that:

.ll → llvm-as → .bc

would be different from:

.ll → opt → .bc

The latter would drop the summary.

Are you referring to the fact that opt without passes would drop the
summary? That's true, but in that case we have to choose whether
opt-without-passes is consistent with opt-with-passes or with llvm-as. opt
being consistent with itself seems a little better, and would also reflect
the behaviour if we did actually run a pass.

Thanks,

There is still a discrepancy in that:

.ll → llvm-as → .bc

would be different from:

.ll → opt → .bc

The latter would drop the summary.

Though by the sounds of it .bc → opt → .bc drops the summary already? So there’s some consistency there.

But, yeah, it’s not clear to me if it’d be better if that did something else. This is mostly/purely for testing/development purposes anyway.

Le jeu. 3 mai 2018 à 15:52, Peter Collingbourne <peter@pcc.me.uk> a
écrit :

Le mar. 1 mai 2018 à 16:50, Teresa Johnson <tejohnson@google.com> a
écrit :

Hi Mehdi, thanks for the comments, responses and a tweaked proposal
below. Teresa

Hi,

My main concern is this one:

> Currently, I am emitting the summary entries at the end, after
the metadata nodes. Note that the ModuleSummaryIndex is not currently
referenced from the Module, and isn’t currently created when parsing the
Module IR bitcode (there is a separate derived class for reading the
ModuleSummaryIndex from bitcode). This is because they are not currently
used at the same time. However, in the future there is no reason why we
couldn’t tag the global values in the Module’s LLVM assembly with the
corresponding summary entry if the ModuleSummaryIndex is available when
printing the Module in the assembly writer. I.e. we could do the following
for “main” from the above example when printing the IR definition (note the
“^3” at the end):

I believe the reason that the ModuleSummaryIndex is not attached to
the Module is that it is fundamentally not a piece of IR, but it is
conceptually really an Analysis result.
Usually other analyses don't serialize their result, we happen to
serialize this one for an optimization purpose (reloading it and making the
thin-link faster).

True. My understanding is that the push for having it serialized via
assembly is due to the fact that it is emitted into the bitcode. I know
there is disagreement on this reasoning, I am hoping to have a proposal
that is acceptable to everyone. =)

The fundamental problem is that an analysis result has to be able
to be invalidated with IR changes, attaching this directly to the module
wouldn't achieve this. The risk is that when the IR and the summary get
out-of-sync (`clang -O2 my_module_with_summaries.ll -emit-llvm -o
my_optimized module_with_summaries.ll`) the summaries would be badly wrong.

Have you looked into what it'd take to make it a "real" analysis in
the pass manager?

Thanks for raising this issue specifically, I hadn't addressed it in
my proposal and it is a big one. I am not proposing that we attempt to
maintain the summary through optimization passes, and definitely don't
think we should do that. IMO deserializing it should be for testing the
thin link and the combined summaries in the backends only. To that end, I
have an idea (below some background first).

Note that in some cases the module summary analysis is an analysis
pass. I.e. when invoked by "opt -module-summary=". However, some time ago
when Peter added the support for splitting the bitcode (for CFI purposes)
and therefore needed to generate a summary in each partition (Thin and
Regular), he added the ThinLTOBitcodeWriterPass, which invokes the module
summary builder directly (twice). This writer is what gets invoked now when
building via "clang -flto=thin", and with "opt -thinlto-bc". So there it is
not invoked/maintained as an analysis pass/result. It would be tricky to
figure out how to even split rather than recompute the module summary index
in that case. Even in the case where we are still invoking as an analysis
pass (opt -module-summary), we would need to figure out how to read in the
module summary to use as the analysis result when available (so that it
could be invalidated and recomputed when stale).

Rather than add this plumbing, and just have it discarded if opt
does any optimization, I think we should focus at least for the time being
on supporting reading the summary from assembly exactly where we currently
read in the summary from bitcode:
1) For the thin link (e.g. tools such as llvm-lto2 or llvm-lto,
which currently have to be preceded by "opt -module-summary/-thinlto-bc" to
generate an index, but could just build it from assembly instead).
2) For the LTO backends (e.g. tools such as llvm-lto which can
consume a combined index and invoke the backends, or "clang
-fthinlto-index=" for distributed ThinLTO backend testing), where we could
build the combined summary index from assembly instead.

This greatly simplifies the reading side, as there are no
optimizations performed on the IR after the index is read in these cases
that would require invalidation. It also simplifies adding the parsing
support, since it gets invoked exactly where we expect to build an index
currently (i.e. since we don't currently build or store the
ModuleSummaryIndex when parsing the Module from bitcode). It doesn't
preclude someone from figuring out how to compute the module summary
analysis result from the assembly, and invalidating it after optimization,
when reading the Module IR via 'opt' in the future.

Does this seem like a reasonable proposal to everyone?

Sounds good to me.

That would make .ll files quite convenient during debugging I think?
We could disassemble, manually change summaries, and re-assemble a bitcode
file before running the (thin-)link again.

Yes, that seems more reasonable than what I thought you had in mind.
If the only consumer of this information is llvm-as, then the purpose of
the asm summary format is just to provide a way to create a .bc file for
testing purposes, which is certainly a useful capability.

That feels a bit surprising if ".ll -> llvm-as -> .bc -> <some other
tool - opt, etc>" is different from ".ll -> <opt, etc>". Is that what we're
talking about here? Any chance that can be avoided & feeding a .ll file
works (in the sense of does the same thing/tests the same behavior) in all
the same places that feeding a .bc file does? (as is the case with
non-summary-based IR, to the best of my knowledge)

I may be mistaken, but I don't think we have a lot of tools that can
read both .ll and .bc and end up using the summary if it is a .bc file. LTO
can't read .ll, for example. The only one that I can think of is clang and
presumably we could make that use whichever API we would use in llvm-as for
reading the summary from .ll. So the behaviour of most tools would be that
".ll -> llvm-as -> .bc -> <some tool>" vs ".ll -> <some tool>" would end up
being the same in both cases: the summary gets discarded.

There is still a discrepancy in that:

.ll -> llvm-as -> .bc

would be different from:

.ll -> opt -> .bc

The latter would drop the summary.

Are you referring to the fact that opt without passes would drop the
summary? That's true, but in that case we have to choose whether
opt-without-passes is consistent with opt-with-passes or with llvm-as. opt
being consistent with itself seems a little better, and would also reflect
the behaviour if we did actually run a pass.

And note this would also stay consistent with what happens to:

.bc (with summary) -> opt ->bc

Teresa

Yeah, no worries about the linker - though I guess it could be convenient if it did some lookahead & could detect a .ll file as being different from a linker script - but that’s a “nice to have”/heroics, not something I’d expect.

Right - this was more what I had in mind. Basically I’d find it weird if I came across an LLVM test case that first ran a .ll file through llvm-as, then ran it on the tool (llvm-lto2) in question. I’d likely try to remove the indirection & be a bit annoyed if that failed.

FWIW I’m in total agreement here :slight_smile:

-eric

Yep, I sent a similar response - we’re all crossing wires here because the long emails keep getting held for moderator approval. =) Trimmed a bunch of stuff so this one should go through.

It sounds like we have a pretty good consensus at this point. Unless there is an objection, I’ll start preparing patches.

Thanks,
Teresa