I still have some issues with what we're doing with value_kind here. Let
me summarize the arguments I've heard for including the value_kind in
various places in this set of changes:
1. We would like to be able to enable multiple types of value profiling
at the same time.
1a. We would like to be able to selectively use only particular value
kinds from a particular profile.
1b. We would like to be able to combine profiles with different sets of
profiling options enabled.
2. We will need to know the value kind in order to parse the raw profile
data.
3. We'd like to avoid needing to change the .profdata format when we add
new value kinds, since it needs to stay backwards compatible.
4. The frontend work is simpler if the value kind is explicitly encoded.
There are a couple of these points that certainly need to be addressed,
and some that are a little bit less clear.
A. We do need to handle (1) in some way, but the issue can be stated a
little bit more generally. If there are optional parts of the profile
data, we definitely need to know which options are enabled.
B. (1a) is only an issue if you believe (4). This might be true or it might
not, and I don't think we can make an informed decision until we're
actually using multiple value kinds. For now there's just a bunch of
code that's effectively useless - "if value kind is 2, do nothing". I'm
opposed to adding this kind of practically unreachable code "just in
case". It adds a kind of complexity and premature generality that, in my
experience, will have to be completely re-done when we get to actually
generalizing the code.
C. (1b) is an interesting one. Is this actually a useful feature? If we
want to combine multiple profiles with different subsets of profiling
turned on, then we definitely need the information recorded
somewhere.
D. I have issues with (2). If the different value kinds actually have
different structures and need to be read in / interpreted
differently, then writing the support for them now simply won't
work. We're reading them in as if they're interpreted in the exact
same way as the indirect call profiling - if they aren't, then we
have to rewrite this code when we add a new one anyway. What's the
point of writing code that won't actually work when we try to use it?
E. There is value in dealing with (3) as long as we're confident that we
can get it right. If we're pretty confident that the value profile
data structure we're encoding in the .profdata file is going to work
for multiple value profile kinds, then it makes sense to allow
multiple of them and encode which kinds they are. David, is
"InstrProfValueRecord" in Betul's current patches going to work to
record the other types of value profiling information you're planning
on looking at?
I'd also like to point out that the instrprof parts of LLVM try to be
relatively frontend agnostic. We should strive to keep it such that the
frontend is the piece that decides what a particular profiling kind
means if possible, as long as the structure and way the data is laid out
is suitable. This doesn't affect the current work very much, but it's a
good idea to keep it in mind.
So, with all of that said, I have a couple of suggestions on how to move
forward.
- I don't think we should store the kind in the raw profile yet. Let's
keep this simple, since it's easy to change and will be easier to get
right when we're implementing or experimenting with a second kind
(based on (D)).
- Assuming the answer to my question in (E) is "yes", let's go ahead and
include a value kind for the value records in the .profdata format. I
don't really like the current approach of an array of std::vectors,
because I think we can do this more simply. I'll bring that up on the
review thread.
- In clang, we should error if we see value kinds we don't understand
yet.
I think that this addresses the short term concerns without forcing us
to write unreachable/untestable code that may or may not work for the
longer term concerns. It minimally handles (A ) in that the data that is
there implies which features are being used, puts off (B) without making
it harder to deal with when the time comes, allows us to implement (C)
without much trouble if we want it, avoids doing work that could cause
problems with (D), and handles (E) since avoiding it would mean more
work later.
What do you think?
Xinliang David Li <davidxl@google.com> writes: