> And now here is my educated speculation:
> There are 2 things that became slower
> 1) Use::getUser()
> 2) Use::get/set due to tagging.
> The former is seldom called:
> $ find lib -name "*.cpp" | xargs grep "getUser(" | wc -l
> 41
The majority of those aren't actually Use::getUser, but on the
other hand this grep misses all the users of
value_use_iterator::operator*, which is much more popular.
Unfortunately, overloaded operators are not the easiest to
grep for ;-).
Hi Dan,
thanks for this hint, this makes me to look harder into cost reducing
getUser. My current idea is to do a very cheap heuristic inline, which
catches 100% of the cases for UnaryInstruction and 50% of the cases
for Binary/CmpInstruction, without doing a function call at all.
> The second is counterbalanced with a faster access to the Use object
> in most cases:
> With exception of PHINode and SwitchInst, the getOperand() function
> (if called on a specialized "this" pointer) does a "this"-relative
> access instead of getting OperandList pointer first and going thru
> that. This was the case before for fixed-arity instructions, but now
> it applies to variadic ones too that cannot perform operand list
> resizing.
> Some things got faster, like
> 1) getOperand access on (say) CallInst is now fetching operands from
> relative to "this". Also, there is no "new Use[n]" allocations (and
> delete ) for these any more.
This is fine, but technically it's an independent change that could
be done without the getUser algorithm changes and tagging.
> 2) no need to maintain Use::U
> 3) data-cache related gains on smaller Use objects (not yet activated)
> So, my idea is that these changes are performance neutral.
> There are potentially more speedups to be realized:
> - indexing operands in the decreasing direction eliminates the
> getNumOperands() call in getOperand() for variadic arity instructions.
At a glance, the only getOperand() that I saw that uses getNumOperands()
is ReturnInst's, and that actually looks like a bug.
> - shaving off (unneeded) operand related administrative members from
> User.
Which members?
I am thinking of OperandList and NumOperands. In most User subclasses
these
are trivially recomputable/constant. The catch is only that getting
these via a
User* is a bit more costly (virtual call or other trick). The question
is how
often the code calls getOperand thru an unspecialized User* ?
> I hope that this is interesting, but I'd like to ask anybody who is
> comfortable with performance testing to help provide some hard
> data 
I agree that performance testing is not easy and requires resources,
I am on my way into this area. Now I have a working Release setup
and already in possession of hard numbers, which are pretty
encouraging.
I see about 1% of speed degradation on a cruel verifier test
(2004-09-29-VerifierIsReallySlow.llx) but even there the user-time
is well in the cloud of measurements of the regular (non-tagged)
build.
On all other tests I have seen no degradation so far. And this
was even before I moved the tag from the Val to the Pred member of
Use.
but I'm curious what's motivating this project here. My assumption
LLVM is intended to be a pretty much target independent technology
with a
compact delivery format and potentially good startup behaviour. To
actually being a viable system for a wide range of machines it
must be as memory efficient as possible. The in-memory data structures
are needed for the JIT and optimizer and thus they must coexist with
the
generated machine code for the platform. <Use> seems to be the
structure
with the currently greatest bang for buck ratio, so I am thinking
about
a solution to that 
has been that no one would likely go to the extreme of inventing a
mini-language encoded in the least significant bits of pointer
members in successive structs unless they were pretty desperate for
performance or scalability.
Yep. There are desperate needs. I am in the embedded world. When
sometime
iPhone/Pod-like devices are supposed to run many multi-megainstruction
programs routinely, anybody else will see the difference.
Regarding the mini-language, I decided that the alternatives like
linear
search (see the Kernighan-Ritchie mini-language of null-terminated
strings
and strlen
and map lookup are far more costly, esp. in a multi-
threaded
environment. For completeness see my musings at <http://
heisenbug.blogspot.com/2008/04/llvm-data-structures-and-putting-use-
on.html>.
Dan
Thanks for your feedback!
Cheers,
Gabor