> And now here is my educated speculation:
> There are 2 things that became slower
> 1) Use::getUser()
> 2) Use::get/set due to tagging.
> The former is seldom called:
> $ find lib -name "*.cpp" | xargs grep "getUser(" | wc -l
The majority of those aren't actually Use::getUser, but on the
other hand this grep misses all the users of
value_use_iterator::operator*, which is much more popular.
Unfortunately, overloaded operators are not the easiest to
grep for ;-).
thanks for this hint, this makes me to look harder into cost reducing
getUser. My current idea is to do a very cheap heuristic inline, which
catches 100% of the cases for UnaryInstruction and 50% of the cases
for Binary/CmpInstruction, without doing a function call at all.
> The second is counterbalanced with a faster access to the Use object
> in most cases:
> With exception of PHINode and SwitchInst, the getOperand() function
> (if called on a specialized "this" pointer) does a "this"-relative
> access instead of getting OperandList pointer first and going thru
> that. This was the case before for fixed-arity instructions, but now
> it applies to variadic ones too that cannot perform operand list
> Some things got faster, like
> 1) getOperand access on (say) CallInst is now fetching operands from
> relative to "this". Also, there is no "new Use[n]" allocations (and
> delete ) for these any more.
This is fine, but technically it's an independent change that could
be done without the getUser algorithm changes and tagging.
> 2) no need to maintain Use::U
> 3) data-cache related gains on smaller Use objects (not yet activated)
> So, my idea is that these changes are performance neutral.
> There are potentially more speedups to be realized:
> - indexing operands in the decreasing direction eliminates the
> getNumOperands() call in getOperand() for variadic arity instructions.
At a glance, the only getOperand() that I saw that uses getNumOperands()
is ReturnInst's, and that actually looks like a bug.
> - shaving off (unneeded) operand related administrative members from
I am thinking of OperandList and NumOperands. In most User subclasses
are trivially recomputable/constant. The catch is only that getting
these via a
User* is a bit more costly (virtual call or other trick). The question
often the code calls getOperand thru an unspecialized User* ?
> I hope that this is interesting, but I'd like to ask anybody who is
> comfortable with performance testing to help provide some hard
I agree that performance testing is not easy and requires resources,
I am on my way into this area. Now I have a working Release setup
and already in possession of hard numbers, which are pretty
I see about 1% of speed degradation on a cruel verifier test
(2004-09-29-VerifierIsReallySlow.llx) but even there the user-time
is well in the cloud of measurements of the regular (non-tagged)
On all other tests I have seen no degradation so far. And this
was even before I moved the tag from the Val to the Pred member of
but I'm curious what's motivating this project here. My assumption
LLVM is intended to be a pretty much target independent technology
compact delivery format and potentially good startup behaviour. To
actually being a viable system for a wide range of machines it
must be as memory efficient as possible. The in-memory data structures
are needed for the JIT and optimizer and thus they must coexist with
generated machine code for the platform. <Use> seems to be the
with the currently greatest bang for buck ratio, so I am thinking
a solution to that
has been that no one would likely go to the extreme of inventing a
mini-language encoded in the least significant bits of pointer
members in successive structs unless they were pretty desperate for
performance or scalability.
Yep. There are desperate needs. I am in the embedded world. When
iPhone/Pod-like devices are supposed to run many multi-megainstruction
programs routinely, anybody else will see the difference.
Regarding the mini-language, I decided that the alternatives like
search (see the Kernighan-Ritchie mini-language of null-terminated
and strlen and map lookup are far more costly, esp. in a multi-
environment. For completeness see my musings at <http://
Thanks for your feedback!