I think I need to explain the situation more. There is a example from
previous example.
Source code:
typedef unsigned short int UV __attribute__((vector_size (8)));
void test (UV *x, UV *y) {
*x = *y / ((UV) { 4, 4, 4, 4 });
}
IR snippet from "*x = ...":
...
%div = udiv <4 x i16> %1, %2
%3 = load <4 x i16>*, <4 x i16>** %x.addr, align 4
store <4 x i16> %div, <4 x i16>* %3, align 8
...
Selection Dag before type legalize:
...
0x85ceac8: i32,ch = load 0x85ced18, 0x85cf1b8,
0x85c8e20<LD4[%x.addr]> [ORD=12]
...
0x85d3a28: v4i16 = srl 0x85d51a0, 0x85c79d0 [ORD=11] <-- from udiv
0x85c8aa8: ch = store 0x85ceac8:1, 0x85d3a28, 0x85ceac8,
0x85c8e20<ST8[%3]> [ORD=13]
...
Selection Dag after type legalize:
...
0x85ceac8: i32,ch = load 0x85d0798, 0x85cf1b8, 0x85c8e20<LD4[%x.addr]>
[ORD=12] [ID=-3]
...
0x85dc2a8: ch = store 0x85ceac8:1, 0x85d18c8, 0x85ceac8,
0x85c8e20<ST2[%3](align=8), trunc to i16> [ORD=13] [ID=-3]
...
0x85dc058: ch = store 0x85ceac8:1, 0x85c8cf8, 0x85dbab0,
0x85c8e20<ST2[%3(align=8)+2](align=2), trunc to i16> [ORD=13] [ID=-3]
...
0x85db860: ch = store 0x85ceac8:1, 0x85d2fc0, 0x85dacd0,
0x85c8e20<ST2[%3(align=8)+4](align=4), trunc to i16> [ORD=13] [ID=-3]
...
0x85db610: ch = store 0x85ceac8:1, 0x85d09e8, 0x85db170,
0x85c8e20<ST2[%3(align=8)+6](align=2), trunc to i16> [ORD=13] [ID=-3]
...
The vector type operations are scalarized because the target does not
support vector type like above selection dag. The scalarized each
store has the same chain from load:0x85ceac8 because it assumes they
access different address. As I said on first e-mail, I lower the each
store to 2 load and 2 store nodes for 2 words with high and low
address and the address could be same between adjacent vector
element's stores. In SelectionDAG stage, I have tried to keep the
order of load and store nodes with chain and glue while I lowering the
each element's store. But, In machine IR stage, it is broken because
they are not dependent each other and could access same address. One
vector element's load and store could interfere between the other's
load and store. If I try to use the 2 words way, I need to keep the
each vector element's store as one chunk. But I am not sure whether it
is good way or not... If someone has experience with this kind of
situation, please give me any comment. It will be very helpful.
Thanks,
JInGu Kang