sinking address computing in CodeGenPrepare

Hi All,

In CodeGenPrepare pass, OptimizeMemoryInst() try to sink address computing into users' block by converting GET to integers? It appear that it have impacts on ISel's result, but I'm not clear about the main purpose of the transformation.

FROM :
for.body.lr.ph:
                %zzz = getelementptr inbounds %struct.SS* %a2, i32 0, i32 35

for.body:
                %4 = load double* %zzz, align 8, !tbaa !0

TO :
for.body:
  %sunkaddr27 = ptrtoint %struct.SS* %a2 to i32 <----- sink address computing into user's block
  %sunkaddr28 = add i32 %sunkaddr27, 272
  %sunkaddr29 = inttoptr i32 %sunkaddr28 to double*
  %4 = load double* %sunkaddr29, align 8, !tbaa !8

From what I observed, this transformation can cause poor alias analysis results without using GEP. So, I want to see there is any way to avoid this conversion.

My question is :
1. Why do we need to sink address computing into users' block? What is the benefit of this conversion ?
2. Can we directly use GEP instead of breaking it into integer calculations ?

Thanks,
Jun

The reason for this is to allow folding of address computation into loads and stores. A lot of modern arch, e.g. X86 and arm, have complex addressing mode.

Evan

I wonder why CodeGenPrepare breaks GEP into integer calculations (ptrtoin/add/inttopt) instead of directly sinking the address calculation using GEP into user's block.

Thanks,
Jun

I wonder why CodeGenPrepare breaks GEP into integer calculations (ptrtoin/add/inttopt) instead of directly sinking the address calculation using GEP into user's block.

I believe it's primary for address mode matching where only part of the GEP can be folded (depending on the instruction set).

Evan

When multiple GEPs or other operations are used for the address calculation, OptimizeMemoryInst() performs address matching and determines a final addressing expression as a simple form (e.g., ptrtoint/add/inttoptr) and sinks it into user's block so that ISel could have better chance to fold address computation into LDRs and STRs. However, OptimizeMemoryInst() seems to do this transformation even when the address calculation derived from a single GEP, resulting in poor alias analysis because GEP is no longer used.

So, do you think it is a possible workaround to sink a GEP without converting it into a set of integer operations (ptrtoint/add/inttoptr) if the address mode is derived only from a single GEP.

Thanks,
Jun

When multiple GEPs or other operations are used for the address calculation, OptimizeMemoryInst() performs address matching and determines a final addressing expression as a simple form (e.g., ptrtoint/add/inttoptr) and sinks it into user's block so that ISel could have better chance to fold address computation into LDRs and STRs. However, OptimizeMemoryInst() seems to do this transformation even when the address calculation derived from a single GEP, resulting in poor alias analysis because GEP is no longer used.

I don't follow your last statement. How does this impact AA? CodeGenPrep is run late, after AA is done.

Evan

From: "Evan Cheng" <evan.cheng@apple.com>
To: "Junbum Lim" <junbums@gmail.com>
Cc: llvmdev@cs.uiuc.edu
Sent: Wednesday, November 20, 2013 7:01:49 PM
Subject: Re: [LLVMdev] sinking address computing in CodeGenPrepare

>
>
> When multiple GEPs or other operations are used for the address
> calculation, OptimizeMemoryInst() performs address matching and
> determines a final addressing expression as a simple form (e.g.,
> ptrtoint/add/inttoptr) and sinks it into user's block so that ISel
> could have better chance to fold address computation into LDRs and
> STRs. However, OptimizeMemoryInst() seems to do this
> transformation even when the address calculation derived from a
> single GEP, resulting in poor alias analysis because GEP is no
> longer used.

I don't follow your last statement. How does this impact AA?
CodeGenPrep is run late, after AA is done.

I don't know if this is relevant for Lim or not, but some targets use AA during CodeGen (instruction scheduling mostly, but SDAG too).

-Hal

From: "Evan Cheng" <evan.cheng@apple.com>
To: "Junbum Lim" <junbums@gmail.com>
Cc: llvmdev@cs.uiuc.edu
Sent: Wednesday, November 20, 2013 7:01:49 PM
Subject: Re: [LLVMdev] sinking address computing in CodeGenPrepare

When multiple GEPs or other operations are used for the address
calculation, OptimizeMemoryInst() performs address matching and
determines a final addressing expression as a simple form (e.g.,
ptrtoint/add/inttoptr) and sinks it into user's block so that ISel
could have better chance to fold address computation into LDRs and
STRs. However, OptimizeMemoryInst() seems to do this
transformation even when the address calculation derived from a
single GEP, resulting in poor alias analysis because GEP is no
longer used.

I don't follow your last statement. How does this impact AA?
CodeGenPrep is run late, after AA is done.

I don't know if this is relevant for Lim or not, but some targets use AA during CodeGen (instruction scheduling mostly, but SDAG too).

MachineSched uses AA to determine if something is loop invariant, which basically boils down to looking at machine operand and see it's pointing to constant memory. I don't see how that's impact by GEP vs. ADDS + MUL. Also, the analysis should have already been done and cached.

Evan

From: "Evan Cheng" <evan.cheng@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "LLVM" <llvmdev@cs.uiuc.edu>, "Junbum Lim" <junbums@gmail.com>
Sent: Wednesday, November 20, 2013 7:48:13 PM
Subject: Re: [LLVMdev] sinking address computing in CodeGenPrepare

>> From: "Evan Cheng" <evan.cheng@apple.com>
>> To: "Junbum Lim" <junbums@gmail.com>
>> Cc: llvmdev@cs.uiuc.edu
>> Sent: Wednesday, November 20, 2013 7:01:49 PM
>> Subject: Re: [LLVMdev] sinking address computing in
>> CodeGenPrepare
>>
>>
>>
>>>
>>>
>>> When multiple GEPs or other operations are used for the address
>>> calculation, OptimizeMemoryInst() performs address matching and
>>> determines a final addressing expression as a simple form (e.g.,
>>> ptrtoint/add/inttoptr) and sinks it into user's block so that
>>> ISel
>>> could have better chance to fold address computation into LDRs
>>> and
>>> STRs. However, OptimizeMemoryInst() seems to do this
>>> transformation even when the address calculation derived from a
>>> single GEP, resulting in poor alias analysis because GEP is no
>>> longer used.
>>
>> I don't follow your last statement. How does this impact AA?
>> CodeGenPrep is run late, after AA is done.
>
> I don't know if this is relevant for Lim or not, but some targets
> use AA during CodeGen (instruction scheduling mostly, but SDAG
> too).

MachineSched uses AA to determine if something is loop invariant,
which basically boils down to looking at machine operand and see
it's pointing to constant memory. I don't see how that's impact by
GEP vs. ADDS + MUL.

MachineSched can use AA for a lot more than that. I use AA during scheduling because, in addition to picking up loads from constant memory, it lets me do a kind of modulo scheduling for unrolled loops. AA can tell that loads and stores to different arrays don't alias, and loads and stores to different offsets of the same array don't alias.

Also, the analysis should have already been done
and cached.

BasicAA has a cache internally, but as far as I can tell, only to guard against recursion (and it is emptied after each query). Am I missing something?

-Hal

From: "Evan Cheng" <evan.cheng@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "LLVM" <llvmdev@cs.uiuc.edu>, "Junbum Lim" <junbums@gmail.com>
Sent: Wednesday, November 20, 2013 7:48:13 PM
Subject: Re: [LLVMdev] sinking address computing in CodeGenPrepare

From: "Evan Cheng" <evan.cheng@apple.com>
To: "Junbum Lim" <junbums@gmail.com>
Cc: llvmdev@cs.uiuc.edu
Sent: Wednesday, November 20, 2013 7:01:49 PM
Subject: Re: [LLVMdev] sinking address computing in
CodeGenPrepare

When multiple GEPs or other operations are used for the address
calculation, OptimizeMemoryInst() performs address matching and
determines a final addressing expression as a simple form (e.g.,
ptrtoint/add/inttoptr) and sinks it into user's block so that
ISel
could have better chance to fold address computation into LDRs
and
STRs. However, OptimizeMemoryInst() seems to do this
transformation even when the address calculation derived from a
single GEP, resulting in poor alias analysis because GEP is no
longer used.

I don't follow your last statement. How does this impact AA?
CodeGenPrep is run late, after AA is done.

I don't know if this is relevant for Lim or not, but some targets
use AA during CodeGen (instruction scheduling mostly, but SDAG
too).

MachineSched uses AA to determine if something is loop invariant,
which basically boils down to looking at machine operand and see
it's pointing to constant memory. I don't see how that's impact by
GEP vs. ADDS + MUL.

MachineSched can use AA for a lot more than that. I use AA during scheduling because, in addition to picking up loads from constant memory, it lets me do a kind of modulo scheduling for unrolled loops. AA can tell that loads and stores to different arrays don't alias, and loads and stores to different offsets of the same array don't alias.

I still don't understand what this has to do with whether GEP is lowered in codegenprep though.

Also, the analysis should have already been done
and cached.

BasicAA has a cache internally, but as far as I can tell, only to guard against recursion (and it is emptied after each query). Am I missing something?

It's not clear to me how AA is used in codegen. I understand some information are transferred to memoperands during LLVM IR to SDISel conversion. Is AA actually being recomputed using LLVM IR during codegen?

Evan

From: "Evan Cheng" <evan.cheng@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "LLVM" <llvmdev@cs.uiuc.edu>, "Junbum Lim" <junbums@gmail.com>, "Andrew Trick" <atrick@apple.com>
Sent: Thursday, November 21, 2013 6:47:40 PM
Subject: Re: [LLVMdev] sinking address computing in CodeGenPrepare

>> From: "Evan Cheng" <evan.cheng@apple.com>
>> To: "Hal Finkel" <hfinkel@anl.gov>
>> Cc: "LLVM" <llvmdev@cs.uiuc.edu>, "Junbum Lim" <junbums@gmail.com>
>> Sent: Wednesday, November 20, 2013 7:48:13 PM
>> Subject: Re: [LLVMdev] sinking address computing in
>> CodeGenPrepare
>>
>>
>>
>>>> From: "Evan Cheng" <evan.cheng@apple.com>
>>>> To: "Junbum Lim" <junbums@gmail.com>
>>>> Cc: llvmdev@cs.uiuc.edu
>>>> Sent: Wednesday, November 20, 2013 7:01:49 PM
>>>> Subject: Re: [LLVMdev] sinking address computing in
>>>> CodeGenPrepare
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>> When multiple GEPs or other operations are used for the address
>>>>> calculation, OptimizeMemoryInst() performs address matching and
>>>>> determines a final addressing expression as a simple form
>>>>> (e.g.,
>>>>> ptrtoint/add/inttoptr) and sinks it into user's block so that
>>>>> ISel
>>>>> could have better chance to fold address computation into LDRs
>>>>> and
>>>>> STRs. However, OptimizeMemoryInst() seems to do this
>>>>> transformation even when the address calculation derived from a
>>>>> single GEP, resulting in poor alias analysis because GEP is no
>>>>> longer used.
>>>>
>>>> I don't follow your last statement. How does this impact AA?
>>>> CodeGenPrep is run late, after AA is done.
>>>
>>> I don't know if this is relevant for Lim or not, but some targets
>>> use AA during CodeGen (instruction scheduling mostly, but SDAG
>>> too).
>>
>> MachineSched uses AA to determine if something is loop invariant,
>> which basically boils down to looking at machine operand and see
>> it's pointing to constant memory. I don't see how that's impact by
>> GEP vs. ADDS + MUL.
>
> MachineSched can use AA for a lot more than that. I use AA during
> scheduling because, in addition to picking up loads from constant
> memory, it lets me do a kind of modulo scheduling for unrolled
> loops. AA can tell that loads and stores to different arrays don't
> alias, and loads and stores to different offsets of the same array
> don't alias.

I still don't understand what this has to do with whether GEP is
lowered in codegenprep though.

As I recall, BasicAA does not look through int <-> ptr conversions.

>
>> Also, the analysis should have already been done
>> and cached.
>
> BasicAA has a cache internally, but as far as I can tell, only to
> guard against recursion (and it is emptied after each query). Am I
> missing something?

It's not clear to me how AA is used in codegen. I understand some
information are transferred to memoperands during LLVM IR to SDISel
conversion. Is AA actually being recomputed using LLVM IR during
codegen?

It depends on what the (sub)target requests. By default, no. But if the target overrides TargetSubtargetInfo::useAA to return true, then yes.

-Hal

From: “Evan Cheng” <evan.cheng@apple.com>
To: “Hal Finkel” <hfinkel@anl.gov>
Cc: “LLVM” <llvmdev@cs.uiuc.edu>, “Junbum Lim” <junbums@gmail.com>
Sent: Wednesday, November 20, 2013 7:48:13 PM
Subject: Re: [LLVMdev] sinking address computing in CodeGenPrepare

From: “Evan Cheng” <evan.cheng@apple.com>
To: “Junbum Lim” <junbums@gmail.com>
Cc: llvmdev@cs.uiuc.edu
Sent: Wednesday, November 20, 2013 7:01:49 PM
Subject: Re: [LLVMdev] sinking address computing in
CodeGenPrepare

When multiple GEPs or other operations are used for the address
calculation, OptimizeMemoryInst() performs address matching and
determines a final addressing expression as a simple form (e.g.,
ptrtoint/add/inttoptr) and sinks it into user’s block so that
ISel
could have better chance to fold address computation into LDRs
and
STRs. However, OptimizeMemoryInst() seems to do this
transformation even when the address calculation derived from a
single GEP, resulting in poor alias analysis because GEP is no
longer used.

I don’t follow your last statement. How does this impact AA?
CodeGenPrep is run late, after AA is done.

I don’t know if this is relevant for Lim or not, but some targets
use AA during CodeGen (instruction scheduling mostly, but SDAG
too).

MachineSched uses AA to determine if something is loop invariant,
which basically boils down to looking at machine operand and see
it’s pointing to constant memory. I don’t see how that’s impact by
GEP vs. ADDS + MUL.

MachineSched can use AA for a lot more than that. I use AA during scheduling because, in addition to picking up loads from constant memory, it lets me do a kind of modulo scheduling for unrolled loops. AA can tell that loads and stores to different arrays don’t alias, and loads and stores to different offsets of the same array don’t alias.

I still don’t understand what this has to do with whether GEP is lowered in codegenprep though.

Also, the analysis should have already been done
and cached.

BasicAA has a cache internally, but as far as I can tell, only to guard against recursion (and it is emptied after each query). Am I missing something?

It’s not clear to me how AA is used in codegen. I understand some information are transferred to memoperands during LLVM IR to SDISel conversion. Is AA actually being recomputed using LLVM IR during codegen?

In general, when AA is used during codegen, it grabs the IR value from the machine memoperands, then runs normal IR-level alias analysis. The IR needs to stay around and be immutable. That’s why anything that changes aliasing of IR-level memory ops should be run before CodeGen. For example, stack coloring needs to conservatively mutilate the machine memoperands to work around this problem.

We need to sink address computation to expose addressing modes to ISEL, but I’m not sure why we need to lower to ptrtoint. That doesn’t seem good for AA at all.

-Andy

From: “Evan Cheng” <evan.cheng@apple.com>
To: “Hal Finkel” <hfinkel@anl.gov>
Cc: “LLVM” <llvmdev@cs.uiuc.edu>, “Junbum Lim” <junbums@gmail.com>
Sent: Wednesday, November 20, 2013 7:48:13 PM
Subject: Re: [LLVMdev] sinking address computing in CodeGenPrepare

From: “Evan Cheng” <evan.cheng@apple.com>
To: “Junbum Lim” <junbums@gmail.com>
Cc: llvmdev@cs.uiuc.edu
Sent: Wednesday, November 20, 2013 7:01:49 PM
Subject: Re: [LLVMdev] sinking address computing in
CodeGenPrepare

When multiple GEPs or other operations are used for the address
calculation, OptimizeMemoryInst() performs address matching and
determines a final addressing expression as a simple form (e.g.,
ptrtoint/add/inttoptr) and sinks it into user’s block so that
ISel
could have better chance to fold address computation into LDRs
and
STRs. However, OptimizeMemoryInst() seems to do this
transformation even when the address calculation derived from a
single GEP, resulting in poor alias analysis because GEP is no
longer used.

I don’t follow your last statement. How does this impact AA?
CodeGenPrep is run late, after AA is done.

I don’t know if this is relevant for Lim or not, but some targets
use AA during CodeGen (instruction scheduling mostly, but SDAG
too).

MachineSched uses AA to determine if something is loop invariant,
which basically boils down to looking at machine operand and see
it’s pointing to constant memory. I don’t see how that’s impact by
GEP vs. ADDS + MUL.

MachineSched can use AA for a lot more than that. I use AA during scheduling because, in addition to picking up loads from constant memory, it lets me do a kind of modulo scheduling for unrolled loops. AA can tell that loads and stores to different arrays don’t alias, and loads and stores to different offsets of the same array don’t alias.

I still don’t understand what this has to do with whether GEP is lowered in codegenprep though.

Also, the analysis should have already been done
and cached.

BasicAA has a cache internally, but as far as I can tell, only to guard against recursion (and it is emptied after each query). Am I missing something?

It’s not clear to me how AA is used in codegen. I understand some information are transferred to memoperands during LLVM IR to SDISel conversion. Is AA actually being recomputed using LLVM IR during codegen?

In general, when AA is used during codegen, it grabs the IR value from the machine memoperands, then runs normal IR-level alias analysis. The IR needs to stay around and be immutable. That’s why anything that changes aliasing of IR-level memory ops should be run before CodeGen. For example, stack coloring needs to conservatively mutilate the machine memoperands to work around this problem.

We need to sink address computation to expose addressing modes to ISEL, but I’m not sure why we need to lower to ptrtoint. That doesn’t seem good for AA at all.

-Andy

When multiple GEPs or other multiple operations are used for the address calculation, codegenprep performs address matching and determines a final addressing expression as a simple form (e.g., ptrtoint/add/inttoptr) and sinks it into user’s block, resulting in folding of address computations into LDRs and STRs.

However, codegenprep performs this conversion even for an address expression from a single GEP, resulting in poor AA in scheduler because basicaa doesn’t handle IntToPrt. As a simple workaround, I think the GEP could be directly sunk without breaking it into integers operations if the address is simply derived from a single GEP.

-Jun

From: "Junbum Lim" <junbums@gmail.com>
To: "Andrew Trick" <atrick@apple.com>
Cc: "Evan Cheng" <evan.cheng@apple.com>, "Hal Finkel" <hfinkel@anl.gov>, "LLVM" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, November 26, 2013 5:04:43 PM
Subject: Re: [LLVMdev] sinking address computing in CodeGenPrepare

From: "Evan Cheng" < evan.cheng@apple.com >
To: "Hal Finkel" < hfinkel@anl.gov >
Cc: "LLVM" < llvmdev@cs.uiuc.edu >, "Junbum Lim" < junbums@gmail.com
>
Sent: Wednesday, November 20, 2013 7:48:13 PM
Subject: Re: [LLVMdev] sinking address computing in CodeGenPrepare

From: "Evan Cheng" < evan.cheng@apple.com >
To: "Junbum Lim" < junbums@gmail.com >
Cc: llvmdev@cs.uiuc.edu
Sent: Wednesday, November 20, 2013 7:01:49 PM
Subject: Re: [LLVMdev] sinking address computing in
CodeGenPrepare

When multiple GEPs or other operations are used for the address
calculation, OptimizeMemoryInst() performs address matching and
determines a final addressing expression as a simple form (e.g.,
ptrtoint/add/inttoptr) and sinks it into user's block so that
ISel
could have better chance to fold address computation into LDRs
and
STRs. However, OptimizeMemoryInst() seems to do this
transformation even when the address calculation derived from a
single GEP, resulting in poor alias analysis because GEP is no
longer used.

I don't follow your last statement. How does this impact AA?
CodeGenPrep is run late, after AA is done.

I don't know if this is relevant for Lim or not, but some targets
use AA during CodeGen (instruction scheduling mostly, but SDAG
too).

MachineSched uses AA to determine if something is loop invariant,
which basically boils down to looking at machine operand and see
it's pointing to constant memory. I don't see how that's impact by
GEP vs. ADDS + MUL.

MachineSched can use AA for a lot more than that. I use AA during
scheduling because, in addition to picking up loads from constant
memory, it lets me do a kind of modulo scheduling for unrolled
loops. AA can tell that loads and stores to different arrays don't
alias, and loads and stores to different offsets of the same array
don't alias.

I still don't understand what this has to do with whether GEP is
lowered in codegenprep though.

Also, the analysis should have already been done
and cached.

BasicAA has a cache internally, but as far as I can tell, only to
guard against recursion (and it is emptied after each query). Am I
missing something?

It's not clear to me how AA is used in codegen. I understand some
information are transferred to memoperands during LLVM IR to SDISel
conversion. Is AA actually being recomputed using LLVM IR during
codegen?

In general, when AA is used during codegen, it grabs the IR value
from the machine memoperands, then runs normal IR-level alias
analysis. The IR needs to stay around and be immutable. That’s why
anything that changes aliasing of IR-level memory ops should be run
before CodeGen. For example, stack coloring needs to conservatively
mutilate the machine memoperands to work around this problem.

We need to sink address computation to expose addressing modes to
ISEL, but I’m not sure why we need to lower to ptrtoint. That
doesn’t seem good for AA at all.

-Andy

When multiple GEPs or other multiple operations are used for the
address calculation, codegenprep performs address matching and
determines a final addressing expression as a simple form (e.g.,
ptrtoint/add/inttoptr) and sinks it into user's block, resulting in
folding of address computations into LDRs and STRs.

However, codegenprep performs this conversion even for an address
expression from a single GEP, resulting in poor AA in scheduler
because basicaa doesn't handle IntToPrt. As a simple workaround, I
think the GEP could be directly sunk without breaking it into
integers operations if the address is simply derived from a single
GEP.

I think that this makes sense. It will be interesting to see if this results in any non-AA-related CodeGen changes. Can you prepare a patch?

-Hal