Per-write cycle count with ReadAdvance - Do I really need that?

Hi list,I happened to read below thread (written in 3 years ago). I think I may need this ReadAdvance feature to work with my ARCH.

It is about the scheduler info which describes reading my ARCH’s vector register. There are different latencies since forwarding/bypass appears. I give it as below example:

def : WriteRes<WriteVector, [MyArchVALU]> { let Latency = 6; }

def MyWriteAddVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6; }

def MyWriteMulVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6; }

Here I defined 3 different Writes with same latency number. Below shows the forwarding.

def : ReadAdvance<MyReadVector, 5, [WriteVector]>;
def : ReadAdvance<MyReadVector, 3, [MyWriteAddVector_3cycles]>;
def : ReadAdvance<MyReadVector, 1, [MyWriteMulVector_5cycles]>;

def : ReadAdvance<MyReadStoreVector, 0, [WriteVector]>;
def : ReadAdvance<MyReadStoreVector, 0, [MyWriteAddVector_3cycles]>;
def : ReadAdvance<MyReadStoreVector, 0, [MyWriteMulVector_5cycles]>;

Basically my intention is to model that, for any non-store instruction which reads vector, it forwards vector write to: normally 1 cycle, 3 cycles for my ADD, 5 cycles for my MUL. But for any store instruction takes vector register as source, It can not forward. So the latency is kept as 6.

Unfortunately, above code can not be compiled by tblgen. I am not sure if I really need per-write cycle count with ReadAdvance, or there is any existed method to meet my requirement. Anyway the latencies here seems to be decided by considering both

a) 3 kinds of Write,
b) 2 kinds of Read.

Therefore I doubt if it can not be modeled with current tblgen implement.

Can you comment and help?

Hi list,I happened to read below thread (written in 3 years ago). I think I may need this ReadAdvance feature to work with my ARCH.

It is about the scheduler info which describes reading my ARCH’s vector register. There are different latencies since forwarding/bypass appears. I give it as below example:

def : WriteRes<WriteVector, [MyArchVALU]> { let Latency = 6; }

def MyWriteAddVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6; }

def MyWriteMulVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6; }

Here I defined 3 different Writes with same latency number. Below shows the forwarding.

def : ReadAdvance<MyReadVector, 5, [WriteVector]>;
def : ReadAdvance<MyReadVector, 3, [MyWriteAddVector_3cycles]>;
def : ReadAdvance<MyReadVector, 1, [MyWriteMulVector_5cycles]>;

def : ReadAdvance<MyReadStoreVector, 0, [WriteVector]>;
def : ReadAdvance<MyReadStoreVector, 0, [MyWriteAddVector_3cycles]>;
def : ReadAdvance<MyReadStoreVector, 0, [MyWriteMulVector_5cycles]>;

Basically my intention is to model that, for any non-store instruction which reads vector, it forwards vector write to: normally 1 cycle, 3 cycles for my ADD, 5 cycles for my MUL. But for any store instruction takes vector register as source, It can not forward. So the latency is kept as 6.

Unfortunately, above code can not be compiled by tblgen. I am not sure if I really need per-write cycle count with ReadAdvance, or there is any existed method to meet my requirement. Anyway the latencies here seems to be decided by considering both

a) 3 kinds of Write,
b) 2 kinds of Read.

Therefore I doubt if it can not be modeled with current tblgen implement.

I’m not sure if the TableGen bug mentioned below was ever fixed.

It looks to me like this should work, but I haven’t tried it:

def : WriteRes<WriteVector, [MyArchVALU]> { let Latency = 6; }
def MyWriteAddVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6; }
def MyWriteMulVector : SchedWriteRes<[MyArchVALU]> { let Latency = 6; }

// Forward from a vector op (normal, add, mul) to a non-store.
def : ReadAdvance<MyReadVector, 5, [WriteVector]>;
def : ReadAdvance<MyReadVector, 3, [MyWriteAddVector]>;
def : ReadAdvance<MyReadVector, 1, [MyWriteMulVector]>;

Additionally, you could do this but I don’t think it would have any effect at all:

// Forward from a vector op (normal, add, mul) to a store.

def : ReadAdvance<MyReadStoreVector, 0, [WriteVector, MyWriteAddVector, MyWriteMulVector]>;

-Andy

Thanks Andrew. I have tried with recent tblgen, ReadAdvance would not work for multiple latencies. Maybe I should make improvement into tblgen if Pierre-Andre does not have the change anymore.

However, I just a little curious about the situation I met. The hardware forwording may fail for different reasons, which different register read may have different latencies, depending both on the register reader and writer. I am freshman into tblgen. So I wonder if any other Target already has other way to describe that .

Thanks Andrew. I have tried with recent tblgen, ReadAdvance would not work for multiple latencies. Maybe I should make improvement into tblgen if Pierre-Andre does not have the change anymore.

However, I just a little curious about the situation I met. The hardware forwording may fail for different reasons, which different register read may have different latencies, depending both on the register reader and writer. I am freshman into tblgen. So I wonder if any other Target already has other way to describe that .

Does this work for you?

// Forward from a vector op (normal, add, mul) to a non-store.
def : ReadAdvance<MyReadVector, 5, [WriteVector]>;
def : ReadAdvance<MyReadVector, 3, [MyWriteAddVector]>;
def : ReadAdvance<MyReadVector, 1, [MyWriteMulVector]>;

A ReadAdvance is associated with a pair of write resource → read resource. You can specify as many variants of read/write resources as you want, even using arbitrary C++ code inside a predicate. So, in theory I think that should be flexible enough.

You can search the in-tree targets to see where ReadAdvance definitions are used. Sorry, I’m not familiar with anything beyond that, but maybe someone else on the list has dealt with the same problem.

-Andy

It does not work. I have tried to use the latest master today. But tblgen still give me information like

error: Resources are defined for both SchedRead and its alias on processor MyArchModel

def : ReadAdvance<MyReadVector, 3, [MyWriteAddVector]>;

^

Unless I change “MyReadVector” to another read like “MyReadVector1”, it would not work. Debugging into tblgen, there is no path to handle multiplle latencies for same Read…

Anyway as you reminded, I am searching for more Target and am looking into Pierre’s change (I finally notice that he has a patch associated within the thread already :slight_smile: If it is feasible, I will try to make any suitable change back upstream)

-Garfee

It does not work. I have tried to use the latest master today. But tblgen still give me information like

error: Resources are defined for both SchedRead and its alias on processor MyArchModel

def : ReadAdvance<MyReadVector, 3, [MyWriteAddVector]>;

^

Unless I change “MyReadVector” to another read like “MyReadVector1”, it would not work. Debugging into tblgen, there is no path to handle multiplle latencies for same Read…

Anyway as you reminded, I am searching for more Target and am looking into Pierre’s change (I finally notice that he has a patch associated within the thread already :slight_smile: If it is feasible, I will try to make any suitable change back upstream)

-Garfee

I see what you mean. I thought the problem was with multiple latencies associated with a single definition: ReadAdvance<Read1, #, [Write1, Write2]>. There definitely should be some way to make this work. If you can upstream the patch that would be fantastic.
-Andy

Thanks for clarifying Andy. I will try to upstream when the patch is ready.

-Garfee