Optimization of array access

I’ve attached a 2 examples of patterns for updating an array. They’re simplified examples of some code generation I’m doing where I need to pass a large, unknown number of arguments (ints, doubles, pointers to other information) to a function.

The two patterns are:

Stack[0] = 0;

Stack[1] = 1;

Stack[2] = 42;

And

Int I = 0;

Stack[i++] = 0;

Stack[i++] = 1;

Stack[i++] = 42;

I’ve found that the optimizer is much faster at handling the 2nd pattern than the 1st. With 3000 assignments - as in the examples attached – “opt –std-compile-opts –S func1.ll –o func1.opt” takes 10 seconds, while func2.ll takes 0.2s. Anyone have any ideas why?

I’ve tested with LLVM 2.9, and LLVM 3.0 RC1.

Regards,

Michael

opt.zip (192 KB)

The optimizer is probably not sophisticated enough to figure out that
the array indices in your first pattern are sequentially incrementing
integers. So the generated code has to store each individual array
index constant somewhere, load it into a register, multiply it by the
size of your array elements, then add it to the pointer to the base of
the array. It does that for each array access.

On some architectures, loading constant values can be expensive
because the machine code words do not provide enough bits to encode
arbitrary immediate constants. The constants are most often stored
just after the end of the function's executable code, then read into a
register with PC-relative addressing.

Other architectures allow one to load any immediate constant, but this
comes at the cost of placing the value of the constant just after the
instruction that loads it. But that stalls the processor's pipeline
until the constant has been loaded into the register.

In your second pattern, the simple way would be to have a fixed
pointer to the base of your array, and a register that is used as an
offset. Most architectures offer autoincrement/autodecrement modes
for access patterns just as you describe, in which that index register
is incremented by the size of your array elements just after it is
used as an offset.

But in this case the optimizer is smart enough to see that you're
indexing through an array. Rather than using a base register and an
offset register, the initial value of i is added to the base register,
then this register is used all by itself, again with autoincrementing.

Not having to use a second register for the index allows that second
register to be used for some other purpose, so that other parts of
your code can be optimized more effectively.

I’ve attached a 2 examples of patterns for updating an array. They’re simplified examples of some code generation I’m doing where I need to pass a large, unknown number of arguments (ints, doubles, pointers to other information) to a function.

The two patterns are:

Stack[0] = 0;

Stack[1] = 1;

Stack[2] = 42;

And

Int I = 0;

Stack[i++] = 0;

Stack[i++] = 1;

Stack[i++] = 42;

I’ve found that the optimizer is much faster at handling the 2nd pattern than the 1st. With 3000 assignments - as in the examples attached – “opt –std-compile-opts –S func1.ll –o func1.opt” takes 10 seconds, while func2.ll takes 0.2s. Anyone have any ideas why?

I’ve tested with LLVM 2.9, and LLVM 3.0 RC1.

Regards,

Michael

Dead Code Elimination. I’m going to try re-working it following some of the insights Don Quixote provided.

Sorry, Dead Store Elimination.

Sorry, Dead Store Elimination.

Hmm... does tweaking the value of BlockScanLimit in
MemoryDependenceAnalysis.cpp help?

-Eli

Tried 5 and 50, doesn't seem to make any difference.

Tried 5 and 50, doesn't seem to make any difference.

Hmm... probably some other quadratic algorithm, then.

-Eli