Functions: sret and readnone

Hi all,

I'm currently building a DSL for a computer graphics project that is
not unlike NVIDIA's Cg. I have an intrinsic with the following
signature

float4 sample(texture tex, float2 coords);

that is translated to this LLVM IR code:

declare void @"sample"(%float4* noalias nocapture sret, %texture,
$float2) nounwind readnone

The type float4 is basically an array of four floats, which cannot be
returned directly on an x86 using the traditional calling conventions
but only via the sret mechanism.

You might already have spotted that "readnone" attribute, which is
causing some problems: The GVN optimization pass seems to treat the
sret pointer just like any other pointer to memory and eliminates all
calls to the function, since it sees it as returning void without
touching any memory. Is there a way to make sure that the GVN pass
interpretes the sret argument as the actual return value of the
function? Or are there other approaches I could try?

Currently, the only way to make sure that the sample function behaves
as expected is to drop the "readnone" attribute, but that obviously
hinders optimization ...

Thanks a lot,
Stephan

I believe you are out of luck for the time being.

I plan to change the codegen stage so that it handles large struct
returns; then you could declare your function to return the four
floats directly and mark it readnone. But I don't have a target date
for that change.

Hi Stephan,

You might already have spotted that "readnone" attribute, which is
causing some problems: The GVN optimization pass seems to treat the
sret pointer just like any other pointer to memory and eliminates all
calls to the function, since it sees it as returning void without
touching any memory.

as explained in the language reference,
  http://llvm.org/docs/LangRef.html,
readonly functions must not write to any byval arguments.
The reason for this is that it allows the inliner to avoid introducing
a temporary variable and copy when inlining readonly functions with
a byval argument.

Is there a way to make sure that the GVN pass

interpretes the sret argument as the actual return value of the
function? Or are there other approaches I could try?

Not for the moment, sorry.

Ciao,

Duncan.

Hi all,

I'm currently building a DSL for a computer graphics project that is
not unlike NVIDIA's Cg. I have an intrinsic with the following
signature

float4 sample(texture tex, float2 coords);

that is translated to this LLVM IR code:

declare void @"sample"(%float4* noalias nocapture sret, %texture,
$float2) nounwind readnone

The best thing to do to handle this is to add a custom AliasAnalysis implementation, which will know the precise mod/ref sets for the function. See docs/AliasAnalysis.html for some more information.

-Chris

Is there a reason it needs to be an array? A vector of four floats
wouldn't have this problem, if that's an option.

Dan

Unfortunately that's not an option. At the moment I'm restricting
myself to the use of scalar code only, in order to be able to
vectorize the code easily later (e.g., float4 as it is now will then
become an array of four vectors for parallel processing of n (probably
4, SSE) pixels). But thanks for coming up with this idea!

Chris, I'll take a look at the AliasAnalysis functionality. Depending
on how much effort it is to implement a solution I might follow this
approach. If not, there's still Kenneth's new code generator to look
forward to. :slight_smile:

Thanks,
Stephan

It's been a while and I finally had the time to look into this.

What I did was to build a custom AliasAnalysis pass, as Chris
suggested, that returns AliasAnalysis::Mod for values passed to the
sample function in the sret spot, and NoModRef for all other values.
I'm also returning AliasAnalysis::AccessesArguments in the pass'
getModRefBehavior methods. However, I haven't been successful with
this approach and hope that someone has an idea on how to fix this.

Here's a step by step illustration of the problem:

1. The following source code is compiled ...

intrinsic float4 sample(int tex, float2 tc);

float4 main(int tex, float2 tc)
{
  float4 x = sample(tex, tc);
  return 0.0;
}

2. ... into the following LLVM code (after a bunch of optimizations
have run):

define void @"main$int$float2"([4 x float]* noalias nocapture sret,
i32, [2 x float]) nounwind {
  %5 = alloca [4 x float], align 4 ; <[4 x float]*>
[#uses=1]
  call void @"sample$int$float2"([4 x float]* %5, i32 %1, [2 x float]
%2)
  store [4 x float] zeroinitializer, [4 x float]* %0
  ret void
}

declare void @"sample$int$float2"([4 x float]* noalias nocapture sret,
i32, [2 x float]) nounwind

As you can see, the call to the sample function is still present,
although the actual value it is supposed to return via its sret
parameter is never used.

Using the AAEvalPass I found out that the alias analysis pass I
implemented seems to work alright (it reports mod for %5):

===== Alias Analysis Evaluator Report =====
  3 Total Alias Queries Performed
  3 no alias responses (100.0%)
  0 may alias responses (0.0%)
  0 must alias responses (0.0%)
  Alias Analysis Evaluator Pointer Alias Summary: 100%/0%/0%
  3 Total ModRef Queries Performed
  2 no mod/ref responses (66.6%)
  1 mod responses (33.3%)
  0 ref responses (0.0%)
  0 mod & ref responses (0.0%)
  Alias Analysis Evaluator Mod/Ref Summary: 66%/33%/0%/0%

Yet, DCE, DSE and GVN fail to remove the function call. (I'm not so
sure which optimization pass to use, so I picked these three as they
seemed to make sense.)

Any ideas? Help would be very much appreciated!

Thank you,
Stephan

Hi Stephan,

intrinsic float4 sample(int tex, float2 tc);

float4 main(int tex, float2 tc)
{
  float4 x = sample(tex, tc);
  return 0.0;
}

without additional information it would be wrong to remove the call to
sample because it might write to a global variable.

As you can see, the call to the sample function is still present,
although the actual value it is supposed to return via its sret
parameter is never used.

Quite right too, see above.

Using the AAEvalPass I found out that the alias analysis pass I
implemented seems to work alright (it reports mod for %5):

===== Alias Analysis Evaluator Report =====
  3 Total Alias Queries Performed
  3 no alias responses (100.0%)
  0 may alias responses (0.0%)
  0 must alias responses (0.0%)
  Alias Analysis Evaluator Pointer Alias Summary: 100%/0%/0%
  3 Total ModRef Queries Performed
  2 no mod/ref responses (66.6%)
  1 mod responses (33.3%)
  0 ref responses (0.0%)
  0 mod & ref responses (0.0%)
  Alias Analysis Evaluator Mod/Ref Summary: 66%/33%/0%/0%

Yet, DCE, DSE and GVN fail to remove the function call. (I'm not so
sure which optimization pass to use, so I picked these three as they
seemed to make sense.)

In order to perform this transform the optimizers would have to work out
that sample does not modify any global state. This cannot be done without
knowing the definition of sample, but you only provide a declaration. If
you provided a body too then the GlobalsModRef analysis might be able to
work it out.

Ciao,

Duncan.

Duncan, thanks for your answer!

In order to perform this transform the optimizers would have to work out
that sample does not modify any global state. This cannot be done without
knowing the definition of sample, but you only provide a declaration.

Which is why I am trying to supply this additional information in a
custom alias analysis pass, but it doesn't seem to work. (The
AAEvalPass stats are precisely for this custom pass.)

Could you take a look at the code, please? Am I missing something
here?

class VISIBILITY_HIDDEN MySretAliasAnalysis : public FunctionPass,
public AliasAnalysis
{
  std::map<std::string, bool> _srets;

public:
  static char ID;

  MySretAliasAnalysis() : FunctionPass(&ID)
  {
    _srets["sample$int$float2"] = true;
    _srets["sample$int$float3"] = true;
  }

  void getAnalysisUsage(llvm::AnalysisUsage &usage) const
  {
    AliasAnalysis::getAnalysisUsage(usage);
    usage.setPreservesAll();
  }

  bool runOnFunction(Function &F)
  {
    AliasAnalysis::InitializeAliasAnalysis(this);
    return false;
  }

  ModRefBehavior getModRefBehavior(CallSite CS,
std::vector<PointerAccessInfo> *Info = 0)
  {
    if(_srets.find(CS.getCalledFunction()->getName()) != _srets.end())
      return AliasAnalysis::AccessesArguments; // only accesses args, no
globals
    return AliasAnalysis::getModRefBehavior(CS, Info);
  }

  ModRefBehavior getModRefBehavior(Function *F,
std::vector<PointerAccessInfo> *Info = 0)
  {
    if(_srets.find(F->getName()) != _srets.end())
      return AliasAnalysis::AccessesArguments; // only accesses args, no
globals
    return AliasAnalysis::getModRefBehavior(F, Info);
  }

  ModRefResult getModRefInfo(CallSite CS, Value *P, unsigned Size)
  {
    std::string functionName = CS.getCalledFunction()->getNameStr();
    if(_srets.find(functionName) != _srets.end())
    {
      if(CS.hasArgument(P))
      {
        if(CS.getArgument(0) == P)
          return AliasAnalysis::Mod; // modify value pointed to by sret
param
        else
          return AliasAnalysis::NoModRef; // there aren't any other pointer
args
      }
    }
    return AliasAnalysis::getModRefInfo(CS, P, Size);
  }

  bool hasNoModRefInfoForCalls() const { return false; }
};

If you provided a body too then the GlobalsModRef analysis might be able to
work it out.

That's not an option because I want the sample function to be resolved
as an external function by the jitter.

Thanks for your time,
Stephan

Hi Stephan,

In order to perform this transform the optimizers would have to work out
that sample does not modify any global state. This cannot be done without
knowing the definition of sample, but you only provide a declaration.

Which is why I am trying to supply this additional information in a
custom alias analysis pass, but it doesn't seem to work. (The
AAEvalPass stats are precisely for this custom pass.)

Could you take a look at the code, please? Am I missing something
here?

I think this cannot possibly work. Imagine that I am a pass trying to
determine if the function call modifies global state. What would I have
to do? I would have to consider all global variables, and query alias
analysis to find out if any of them are modified. But how can I know all
global variables? There is no way to know about global variables that are
not declared in the module, so I can't query about those. And even if I
could know about all global variables, checking all of them every time I
want to consider deleting a function call would be very expensive and not
worthwhile in general. GlobalsModRef only considers global variables with
internal linkage IIRC (I may be wrong about this). That said, maybe there
is some special AA query method that I don't know about that asks "do you
write any global state".

Ciao,

Duncan.

Hey Duncan,

That said, maybe there
is some special AA query method that I don't know about that asks "do you
write any global state".

From my limited understanding of the analysis framework, this method

could be "getModRefBehavior". It returns a value of the ModRefBehavior
enumeration type, which contains members such as DoesNotAccessMemory
or OnlyReadsMemory. In my case AccessesArguments is the correct one, I
guess.

Directly from the header:
// AccessesArguments - This function accesses function arguments in
well
// known (possibly volatile) ways, but does not access any other
memory.

So, there should be a way for optimization passes to know that the
only memory such a function touches is memory, which is accessible via
the pointers that are passed to it as arguments. In the particular
case of my sample function: a pointer to a value on the stack.

Best,
Stephan