Thoughts about the llvm architecture

Hi!

The following thoughts about the llvm architecture I'd like to share with you
(from the perspective of a user):

- If a backend has no vector support, then I wonder why there is no de-vectorization
pass that operates on indermediate llvm-ir. I would think that if you use such a target
then you have to insert a target independent pass before it that it does not have to
care about vector code. The advantage is that constant vector components can already
be handled by instcombine. what do you think?

- If the integer width of a backend is smaller than the integers in the llvm-ir (e.g.
an 8 bit microcontroller) then i also would expect a target independent integer
splitting. the difficulty here is how to handle carry in the ir. but the advantage would
be that if i have e.g. int32_t a = 5 then three of the four bytes are zero and can
be optimized by instcombine. I have seen very bad code in the output of avr-gcc in this case.

-Jochen

Legalize and DAG combine already handle these cases. Why would we want to duplicate the code?

-Chris

Hi!

The following thoughts about the llvm architecture I'd like to share with you
(from the perspective of a user):

- If a backend has no vector support, then I wonder why there is no de-vectorization
pass that operates on indermediate llvm-ir. I would think that if you use such a target
then you have to insert a target independent pass before it that it does not have to
care about vector code. The advantage is that constant vector components can already
be handled by instcombine. what do you think?

- If the integer width of a backend is smaller than the integers in the llvm-ir (e.g.
an 8 bit microcontroller) then i also would expect a target independent integer
splitting. the difficulty here is how to handle carry in the ir. but the advantage would
be that if i have e.g. int32_t a = 5 then three of the four bytes are zero and can
be optimized by instcombine. I have seen very bad code in the output of avr-gcc in this case.
    
Legalize and DAG combine already handle these cases. Why would we want to duplicate the code?

But what is the output of legalize and DAG combine? Is it llvm-ir again?
I think I still miss some of the "big picture". Is there an architecture chart that shows the pipeline
stages of llvm and what is done where?
For example it seems that ligalization takes place after instcombine, but I would think it
should happen before instcombine.

-Jochen

Hi Jochen,

Legalize and DAG combine already handle these cases. Why would we want to duplicate the code?

legalization is a target specific process, thus it is done as part of code
generation.

But what is the output of legalize and DAG combine? Is it llvm-ir again?

No, the input is a "Selection DAG", a data structure previously generated from
the LLVM IR. The output is also a SDag.

For example it seems that ligalization takes place after instcombine,
but I would think it
should happen before instcombine.

DAG combine is the selection DAG equivalent of instcombine, and runs after
legalization.

To summarize: target independent transforms occur on the LLVM IR. Target
specific transforms occur when generating code for the target processor,
and work on completely different data structures.

Ciao,

Duncan.

  

Hi!

The following thoughts about the llvm architecture I'd like to share
with you
(from the perspective of a user):

- If a backend has no vector support, then I wonder why there is no
de-vectorization
pass that operates on indermediate llvm-ir. I would think that if you
use such a target
then you have to insert a target independent pass before it that it does
not have to
care about vector code. The advantage is that constant vector components
can already
be handled by instcombine. what do you think?

- If the integer width of a backend is smaller than the integers in the
llvm-ir (e.g.
an 8 bit microcontroller) then i also would expect a target independent
integer
splitting. the difficulty here is how to handle carry in the ir. but the
advantage would
be that if i have e.g. int32_t a = 5 then three of the four bytes are
zero and can
be optimized by instcombine. I have seen very bad code in the output of
avr-gcc in this case.
    

Legalize and DAG combine already handle these cases. Why would we want to duplicate the code?

But what is the output of legalize and DAG combine? Is it llvm-ir again?
  

No, the output after legalize and DAG combine are a modified selection
DAG graph.
This selection DAG graph later gets converted into MachineInstrs after
scheduling.

All this are CodeGen specific and "lower" than LLVM IR.

I think I still miss some of the "big picture". Is there an architecture
chart that shows the pipeline
stages of llvm and what is done where?
  

This presentation from 2008 by Dan Gohman describes the various pipeline
stages that happens during CodeGen.
And explains the bigger picture on how LLVM IR are converted by the
CodeGen into a leagal stream of MachineInstr.

Slides:
http://llvm.org/devmtg/2008-08/Gohman_CodeGenAndSelectionDAGs.pdf
Video:
http://llvm.org/devmtg/2008-08/Gohman_CodeGenAndSelectionDAGs_Hi.m4v

For more info see
http://llvm.org/docs/CodeGenerator.html

For example it seems that ligalization takes place after instcombine,
but I would think it
should happen before instcombine.
  

The combine phase are actually run twice

Combine
Legalize
Combine

-Jochen

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
  
Hope this helps, Cheers
Xerxes

Hi!

The following thoughts about the llvm architecture I'd like to share with you
(from the perspective of a user):

- If a backend has no vector support, then I wonder why there is no de-vectorization
pass that operates on indermediate llvm-ir. I would think that if you use such a target
then you have to insert a target independent pass before it that it does not have to
care about vector code. The advantage is that constant vector components can already
be handled by instcombine. what do you think?

- If the integer width of a backend is smaller than the integers in the llvm-ir (e.g.
an 8 bit microcontroller) then i also would expect a target independent integer
splitting. the difficulty here is how to handle carry in the ir. but the advantage would
be that if i have e.g. int32_t a = 5 then three of the four bytes are zero and can
be optimized by instcombine. I have seen very bad code in the output of avr-gcc in this case.
    

Legalize and DAG combine already handle these cases. Why would we want to duplicate the code?

But what is the output of legalize and DAG combine? Is it llvm-ir again?
  

No, the output after legalize and DAG combine are a modified selection
DAG graph.
This selection DAG graph later gets converted into MachineInstrs after
scheduling.
  
then it seems this is done:
instCombine (on llvm ir)
llvm ir -> selection dag
lower
combine
legalize
combine
...

I wonder if there is duplicate code in instCombine (on llvm ir, e.g.
llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp)
and combine on selection dag, and worst of all I can't use
lower since I write llvm ir into highlevel code (like CBackend)
and therefore have to implement my own lower. Or do you
a better idea for this?

-Jochen

Jochen Wilhelmy schrieb:

Hi!

The following thoughts about the llvm architecture I'd like to share with you
(from the perspective of a user):

- If a backend has no vector support, then I wonder why there is no de-vectorization
pass that operates on indermediate llvm-ir. I would think that if you use such a target
then you have to insert a target independent pass before it that it does not have to
care about vector code. The advantage is that constant vector components can already
be handled by instcombine. what do you think?

- If the integer width of a backend is smaller than the integers in the llvm-ir (e.g.
an 8 bit microcontroller) then i also would expect a target independent integer
splitting. the difficulty here is how to handle carry in the ir. but the advantage would
be that if i have e.g. int32_t a = 5 then three of the four bytes are zero and can
be optimized by instcombine. I have seen very bad code in the output of avr-gcc in this case.
    

Legalize and DAG combine already handle these cases. Why would we want to duplicate the code?

But what is the output of legalize and DAG combine? Is it llvm-ir again?
  

No, the output after legalize and DAG combine are a modified selection
DAG graph.
This selection DAG graph later gets converted into MachineInstrs after
scheduling.
  
then it seems this is done:
instCombine (on llvm ir)
llvm ir -> selection dag
lower
combine
legalize
combine
...

I wonder if there is duplicate code in instCombine (on llvm ir, e.g.
llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp)
and combine on selection dag, and worst of all I can't use
lower since I write llvm ir into highlevel code (like CBackend)
and therefore have to implement my own lower. Or do you
a better idea for this?

-Jochen

The same passes would be beneficial for my PTX backend implementation(also like Cbackend). Are there reasons for not implementing them on the llvm ir? Is the current implementation easier? I think a llvm ir implementation is more general.

--Helge

hi,

Jochen Wilhelmy schrieb:

Hi!

The following thoughts about the llvm architecture I'd like to share
with you
(from the perspective of a user):

- If a backend has no vector support, then I wonder why there is no
de-vectorization
pass that operates on indermediate llvm-ir. I would think that if you
use such a target
then you have to insert a target independent pass before it that it does
not have to
care about vector code. The advantage is that constant vector components
can already
be handled by instcombine. what do you think?

- If the integer width of a backend is smaller than the integers in the
llvm-ir (e.g.
an 8 bit microcontroller) then i also would expect a target independent
integer
splitting. the difficulty here is how to handle carry in the ir. but the
advantage would
be that if i have e.g. int32_t a = 5 then three of the four bytes are
zero and can
be optimized by instcombine. I have seen very bad code in the output of
avr-gcc in this case.

Legalize and DAG combine already handle these cases. Why would we want to duplicate the code?

But what is the output of legalize and DAG combine? Is it llvm-ir again?

No, the output after legalize and DAG combine are a modified selection
DAG graph.
This selection DAG graph later gets converted into MachineInstrs after
scheduling.

then it seems this is done:
instCombine (on llvm ir)
llvm ir -> selection dag
lower
combine
legalize
combine
...

I wonder if there is duplicate code in instCombine (on llvm ir, e.g.
llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp)
and combine on selection dag, and worst of all I can't use
lower since I write llvm ir into highlevel code (like CBackend)

i encounter the same problem, i build my own schedule DAG from LLVM
IR right now, and planning to build my schedule DAG from
pre-register-allocation MachineInst later, so i can reuse some
existing machine pass.

and therefore have to implement my own lower. Or do you
a better idea for this?

-Jochen

The same passes would be beneficial for my PTX backend
implementation(also like Cbackend). Are there reasons for not
implementing them on the llvm ir? Is the current implementation easier?
I think a llvm ir implementation is more general.

maybe we need a "general target" for the similar task, the
instructions in general target should be almost the same as llvm ir
instruction, and there should be maximum register available so we do
not need to care about the register stuffs.

best regards
ether