[Propose] Add address-taken bit to GlobalVariable for disambiguation purpose

Hi, There:

   I'd like to add bit, called "addr_not_taken", to GlobalVariable in order to
indicate if a GlobalVariable doesn't has its address taken or not.

1.The motivation

Hi, There:

   I'd like to add bit, called "addr_not_taken", to GlobalVariable in
order to
indicate if a GlobalVariable doesn't has its address taken or not.

1.The motivation

    The motivation can be explained by the following example. In this
example,
variable x does not have its address taken, therefore, it cannot be
indirectly
access. So, we can prove "x" and "*p" don't overlap.
   -----------------
   static int x;
   foo(int *p) {
     x = 1;
     *p = 2; // do not modify x
     x = x + 1;
   }
   ---------------

   The semantic of llvm::GlobalVariable::addr_not_taken is:
    1). Of course, the global variable in question doesn't have its
address taken, and
    2). the global variable is not volatile, and

Why are you also imposing this restriction? I'm also not quite sure what it means, in the context of LLVM, where only loads and stores can be volatile.

    3). Compiler need to see all the references of the variables.

   The address-not-taken can be used to disambiguate memory
   operations
between:
     o. direct load/store to a global variable which does not have
     its
address taken, and
     o. any arbitrary indirect memory operations.

   The 2) is to ensure compiler will not set a volatile variable
"addr-not-taken", even
if all its reference are direct accesses.

   The 3) does not necessarily imply the global variable has internal
linkage type, see
following scenario S2 and S3.

   Since llvm::GlobalVariable has plenty of vacant bits (for flags),
this bit will
not increase its size.

2. Scenarios where llvm::GlobalVariable::addr_not_taken is useful

  NOTE: As you can see from the S2 and S3, addr_not_taken can serve
  as
additional info
     of the input IR. In that sense, we should view "addr_not_taken"
     as
an attribute of
     a variable (akin to GNU's function-attribute).

  S1) C-thinking programs likely have lots of global variables, and
  most
of them
      are directly referenced.

      It would be more useful in LTO mode as in this mode more global
variables
      will be recognized as "address-not-taken".

  S2) Compiler may generate some global variables for some special
purpose or for some
      special languages. These global variables are normally directly
accessed. Compiler
      can mark the variable "addr-not-taken" to avoid uncessary
      alias.
We would otherwise
      need to introduce a special meta-data for this purpose.

      Go back to semantic 3). At the moment a compiler generate such
variables,
      it does not physically "see" all the reference of these vars,
however it
      is pretty sure how these these variable will be accessed.

      Note that in this scenario, the references of these
compiler-created global
    variables may occur in multiple modules. Therefore, they don't
    have
"internal"
    linkage.

   S3) Consider this snippet:
   S3) Consider this snippet:
      ---------------------
      static int xyz; // address-not-taken
      foo() {
         access xyz;
      }

      bar() { access xyz; }
      ----------------------

      For some reasons (say, triage problematic functions), we many
separate foo()
      and bar() into different source files. For this purpose, we
      have
to get
      rid of "static" from "xyz", which may bring new problem once in
      a
while,
      for instance, bug disappear, performance degrade.

      Introducing address-not-taken will alleviate such troubles.

3. Q&A

   Q1: If we have nice points-to analysis, do we still need what you
propose today.
--------------------------------------------------------------------------------
   A1: For scenario S2 and S3, points-to analysis will not help. I
   agree
S2 and S3
are not very important.

   For S1, sure, it does not need to be a "nice" analyzer, even a
"stupid" anlayzer
should work for this end. The problem is that points-to analyses are
normally very
expensive. They are normally invoked once the in the entire compiler
pipeline,
before that point, we cannot disambiguate global-variable and
indirection
mem-ops.

   On the other hand, compiler can afford re-analyzing address-taken
   for
particular global variable or all global variables again and again
after
some optimizations (say, DCE). We certainly cannot affording to
invoke
the prohibitively expensive points-to analysis for so many times.

   Q2: Does addr-not-taken imply "unnamed_addr" (Nick asked this
question the other day)
-------------------------------------------------------------------------------------
   A2: No. Consider S2 and S3.

     For S1, I don't know for sure if variable name matter or not if
debug-info is
   turned on.

   Q3: Can we save the not-addr-taken to meta-data?
   -------------------------------------------------
   A3: Meta-data, by its very nature, is part of IR. It is pretty
   slow
to access.
       We can view addr-not-taken as an attribute of the globalvar
       (just
like
       function as attributes like readonly/malloc/....).

         I'm aware of other compilers whose symbol-tables have imilar
         bits
       for all kind of symbols (funcitons and data at level).

   Q4: Can we save the not-addr-taken to an analysis pass
   -----------------------------------------------------
   A4: If we implement this way:
       o. we have add a statement to most, if not all, passes to
       claim it
          preserve this addr-taken analysis result, which is bit
          anonying.

       o. The address-taken information collected before LTO (aka
       pre-IPO),
          cannot be fed to LTO, unless we re-analyze them from ground
          up.

So is the idea that lib/Analysis/IPA/GlobalsModRef.cpp will set this bit when it runs?

-Hal

Hi, Hal:

  Thank you for your feedback, see following inline comment

Thanks

Hi, There:

    I'd like to add bit, called "addr_not_taken", to GlobalVariable in
order to
indicate if a GlobalVariable doesn't has its address taken or not.

1.The motivation

     The motivation can be explained by the following example. In this
example,
variable x does not have its address taken, therefore, it cannot be
indirectly
access. So, we can prove "x" and "*p" don't overlap.
    -----------------
    static int x;
    foo(int *p) {
      x = 1;
      *p = 2; // do not modify x
      x = x + 1;
    }
    ---------------

    The semantic of llvm::GlobalVariable::addr_not_taken is:
     1). Of course, the global variable in question doesn't have its
address taken, and
     2). the global variable is not volatile, and

Why are you also imposing this restriction? I'm also not quite sure what it means, in the context of LLVM, where only loads and stores can be volatile.

2) is just for being pedantic :slight_smile:

One might otherwise argue in this snippet, x apparently does not have its addr taken,
however, it's illegal to say that "x" and "*p" don't alias.

  volatile static int x;
    foo(int *p) {
      x = 1;
      *p = 2; // do not modify x
      x = x + 1;
    }

  > Q4: Can we save the not-addr-taken to an analysis pass
  > -----------------------------------------------------
  > A4: If we implement this way:
  > o. we have add a statement to most, if not all, passes to
  > claim it
  > preserve this addr-taken analysis result, which is bit
  > anonying.
  >
  > o. The address-taken information collected before LTO (aka
  > pre-IPO),
  > cannot be fed to LTO, unless we re-analyze them from ground
  > up.

So is the idea that lib/Analysis/IPA/GlobalsModRef.cpp will set this bit when it runs?

I'm not quire sure your question. Let me try to answer.

In my implementation, I set address_no_taken in lib/Transformation/IPO/GlobalOpt.cpp.
In this pass, it tries to see if a global var has its address taken before it tries to optimize
the global variable.

ModRef is immaterial in this regard.

What I'm try to say in "A4" is that it is really *troublesome* if we save the
addr-taken-analysis result to a analysis-pass (rather than cache the result to
GlobalVariable::addr_not_taken). If we really have to implement this way,
we need to at very least implement following two steps:

2) is just for being pedantic :slight_smile:

One might otherwise argue in this snippet, x apparently does not have its addr taken,
however, it's illegal to say that "x" and "*p" don't alias.

volatile static int x;
   foo(int *p) {
     x = 1;
     *p = 2; // do not modify x
     x = x + 1;
   }

Oops, lame examples, change the example bellow. The reasons remain unchanged : for pedantic reasons,
you can ignore it:-)

volatile static int x;
    foo(volatile int *p) {
      x = 1;
      *p = 2; // do not modify x
      x = x + 1;
    }

Hi, Hal:

  Thank you for your feedback, see following inline comment

Thanks

>> Hi, There:
>>
>> I'd like to add bit, called "addr_not_taken", to
>> GlobalVariable in
>> order to
>> indicate if a GlobalVariable doesn't has its address taken or not.
>>
>> 1.The motivation
>> ===============
>> The motivation can be explained by the following example. In
>> this
>> example,
>> variable x does not have its address taken, therefore, it cannot
>> be
>> indirectly
>> access. So, we can prove "x" and "*p" don't overlap.
>> -----------------
>> static int x;
>> foo(int *p) {
>> x = 1;
>> *p = 2; // do not modify x
>> x = x + 1;
>> }
>> ---------------
>>
>> The semantic of llvm::GlobalVariable::addr_not_taken is:
>> 1). Of course, the global variable in question doesn't have
>> its
>> address taken, and
>> 2). the global variable is not volatile, and
> Why are you also imposing this restriction? I'm also not quite sure
> what it means, in the context of LLVM, where only loads and stores
> can be volatile.
2) is just for being pedantic :slight_smile:

One might otherwise argue in this snippet, x apparently does not have
its addr taken,
however, it's illegal to say that "x" and "*p" don't alias.

Wait, really? I thought that "volatile static int x" just meant that the value of x was volatile; not that 'x' might spontaneously start referring to some other memory location.

  volatile static int x;
    foo(int *p) {
      x = 1;
      *p = 2; // do not modify x
      x = x + 1;
    }

  > Q4: Can we save the not-addr-taken to an analysis pass
  > -----------------------------------------------------
  > A4: If we implement this way:
  > o. we have add a statement to most, if not all, passes to
  > claim it
  > preserve this addr-taken analysis result, which is bit
  > anonying.
  >
  > o. The address-taken information collected before LTO (aka
  > pre-IPO),
  > cannot be fed to LTO, unless we re-analyze them from
  > ground
  > up.

> So is the idea that lib/Analysis/IPA/GlobalsModRef.cpp will set
> this bit when it runs?
>
>
I'm not quire sure your question. Let me try to answer.

In my implementation, I set address_no_taken in
lib/Transformation/IPO/GlobalOpt.cpp.
In this pass, it tries to see if a global var has its address taken
before it tries to optimize
the global variable.

ModRef is immaterial in this regard.

What I'm try to say in "A4" is that it is really *troublesome* if we
save the
addr-taken-analysis result to a analysis-pass (rather than cache the
result to
GlobalVariable::addr_not_taken). If we really have to implement this
way,
we need to at very least implement following two steps:

Okay, sounds good. IIRC, GlobalsModRef currently (also) determines whether or not a global has its address taken. It might make sense to unify that logic with what you're proposing for GlobalOpt, and it seems like it would definitely make sense to let GlobalsModRef take advantage of the address_not_taken bit.

Also, as a general note, I don't see why any of this should be LTO-specific. For variables with local (internal) linkage, we can do the analysis on a per-module basis, and I don't understand why we currently don't.

Thanks again,
Hal

2) is just for being pedantic :slight_smile:

One might otherwise argue in this snippet, x apparently does not have
its addr taken,
however, it's illegal to say that "x" and "*p" don't alias.

Wait, really? I thought that "volatile static int x" just meant that the value of x was volatile; not that 'x' might spontaneously start referring to some other memory location.

You are right. I mixed this proposal with a project I did before.
In that project the interface is asymmetric and the return value take into account
of concurrent access.

Sorry for being sloppy. Please ignore 2) completely.

I'm not quire sure your question. Let me try to answer.

In my implementation, I set address_no_taken in
lib/Transformation/IPO/GlobalOpt.cpp.
In this pass, it tries to see if a global var has its address taken
before it tries to optimize
the global variable.

ModRef is immaterial in this regard.

What I'm try to say in "A4" is that it is really *troublesome* if we
save the
addr-taken-analysis result to a analysis-pass (rather than cache the
result to
GlobalVariable::addr_not_taken). If we really have to implement this
way,
we need to at very least implement following two steps:

Okay, sounds good. IIRC, GlobalsModRef currently (also) determines whether or not a global has its address taken. It might make sense to unify that logic with what you're proposing for GlobalOpt, and it seems like it would definitely make sense to let GlobalsModRef take advantage of the address_not_taken bit.

I don't read ModRef carefully. I think it make some sense to factor out the addr-taken analysis. Such that
other pass can take advantage of it.

Also, as a general note, I don't see why any of this should be LTO-specific. For variables with local (internal) linkage, we can do the analysis on a per-module basis, and I don't understand why we currently don't.

Thanks again,
Hal

You can understand from this example: Suppose the a.c b.c comprise the program where a.c has some static variable whose address
are not taken. The addr-taken information collected during stage time-stamp-0 cannot be found at time-stamp-1.2.

If we were saving the addr-not-taken info in GlobalVariable::not_addr_taken, optimizations in 1.2 will automatically benefit from
cached information.

To answer Hal’s question, I think internal globals already benefit from Shuxin’s proposed optimization, independent of LTO. Shuxin is only pointing out a potential additional benefit, either in terms of compile time (not having to rerun the analysis) or optimization opportunities (for passes that run before GlobalOpt reruns).

Whether to split address-taken into a separate Analysis is a judgement call. Conceptually it could make sense as a separate analysis. I think Shuxin is pointing out that using a flag is a more straightforward approach, simpler to engineer, and has some potential benefits (although not huge benefits).

Either way, it could still make sense to factor the logic across GlobalOpt/GlobalModRef if it’s not too messy.

-Andy

Actually, it is not that difficult to predict the effect on compile time. All you need to do is implement the analysis “on the fly” and measure the compile time. Can you provide this data ? Compile time measurements can help us make a decision.

Nadav:

I don’t think this is right approach for engineering.
The time-complexity of re-analyzing addr_taken for each single alias query depends on

  1. how many global variable
  2. how many occurrence of these global variables.
  3. how many queries the compiler have.
  1. depends on compiler. You never know what we will have in the following few years.
    1 and 2 depends on the program. You never know what kind of program you will run into.
    How can we use what we have today the extrapolate the future ignoring the highly
    unpredictable complexity.

It’s interesting that recently, many EE magazine (circuit cellar, Elector, EE times) are
discussing buggy SW kill people. I remember some posts complaining that some buggy program
have amazingly large # of global variables. I can find one post in Chinese website:

The 1st post says, “a program has 11000 global variables”! As to “Can you provide this data”? My answer is no, and I will not to implement the analysis which perform on-the-fly analysis unless I’m convinced that saving addr_taken bit to llvm::GlobalVariable is fundamentally flawed. Shuxin

Also, as a general note, I don't see why any of this should be
LTO-specific. For variables with local (internal) linkage, we can do
the analysis on a per-module basis, and I don't understand why we
currently don't.

Thanks again,
Hal

You can understand from this example: Suppose the a.c b.c comprise
the program where a.c has some static variable whose address
are not taken. The addr-taken information collected during stage
time-stamp-0 cannot be found at time-stamp-1.2.

If we were saving the addr-not-taken info in
GlobalVariable::not_addr_taken, optimizations in 1.2 will
automatically benefit from
cached information.

To answer Hal’s question, I think internal globals already benefit
from Shuxin’s proposed optimization, independent of LTO. Shuxin is
only pointing out a potential additional benefit, either in terms of
compile time (not having to rerun the analysis) or optimization
opportunities (for passes that run before GlobalOpt reruns).

Okay; this sounds good. Part of my motivation for asking the question was a mistake on my part: GlobalOpt is always used, while GlobalModRef is currently used only during LTO. I don't understand why GlobalModRef is LTO only, but I think that's a separate topic from this address_not_taken attribute.

To be clear, I'm generally in favor of having such an attribute, under the conditions that we have a pass that infers the attribute for internal-linkage variables at the module level (in addition to for all variables during LTO), and that the attribute is used by some AA that is used by default (not just during LTO as the current GlobalModRef analysis is). We might want to start using GlobalModRef as part of the normal module pass manager configuration, or BasicAA could use the attribute, or both (or something else entirely).

Thanks again,
Hal

Nadav:

I don’t think this is right approach for engineering.
The time-complexity of re-analyzing addr_taken for each single alias query depends on

  1. how many global variable
  2. how many occurrence of these global variables.
  3. how many queries the compiler have.
  1. depends on compiler. You never know what we will have in the following few years.
    1 and 2 depends on the program. You never know what kind of program you will run into.
    How can we use what we have today the extrapolate the future ignoring the highly
    unpredictable complexity.

This logic doesn’t make sense to me. You can implement it both ways and get empirical results on programs we have today and in our compiler. This is not a theoretical exercise.

In practice, walking the use list of a global variable is very fast. As you’ve noticed, we already use this approach (in an admittedly ad-hoc and decentralized way) throughout the compiler.

It’s interesting that recently, many EE magazine (circuit cellar, Elector, EE times) are
discussing buggy SW kill people. I remember some posts complaining that some buggy program
have amazingly large # of global variables. I can find one post in Chinese website:

The 1st post says, “a program has 11000 global variables”!

This is just FUD and completely unrelated to the discussion.

As to “Can you provide this data”? My answer is no, and I will not to implement the analysis
which perform on-the-fly analysis unless I’m convinced that saving addr_taken bit to llvm::GlobalVariable
is fundamentally flawed.

You don’t have to be convinced. The burden of proof is on you - not on us to convince you.

Here’s the deal: there are tons of “potentially useful” things that could be encoded in the IR. Each thing added to IR has a complexity increase on the entire compiler. Passes that work on global variables will have to reason about this bit, and transformations that could invalidate it (e.g. global merging) will have to have code added to update/preserve it.

We are very conservative about changing IR for good reason. We don’t add caches to IR unless there is pretty much no other way to achieve the result. In a perfect world, we would have nothing redundant in the IR at all.

That said, I’m open to this attribute, because I think the semantics can be nailed down tightly (though your “volatile” discussion doesn’t make any sense to me) it is widely useful, and I don’t think the burden of maintaining it will be that high. However, before we do it, you need to demonstrate that lazily computing it from use-def chains is empirically worse.

-Chris

Hi, all:

Per Chris and Nadav’s request, I begin to write the code about analyzing address-taken
lazily. I realize the alias query could be initiated from any context (function pass, loop pass etc),
however, the analysis for global-variable-address-taken is conducted in module scope.
Is there any potential problem over here? (For instance, function foo() and bar() comprise module m,
however, at time optimizer is working on foo(), bar() is not physically in that module. In this case,
analyze global-variable on the fly doesn’t make sense.)

Thanks in advance!
Shuxin

Hi, all:

Per Chris and Nadav's request, I begin to write the code about
analyzing address-taken
lazily. I realize the alias query could be initiated from any context
(*function* pass, loop pass etc),
however, the analysis for global-variable-address-taken is conducted
in *module* scope.
Is there any potential problem over here? (For instance, function
foo() and bar() comprise module m,
however, at time optimizer is working on foo(), bar() is not
physically in that module. In this case,
analyze global-variable on the fly doesn't make sense.)

Shuxin,

I don't see how this can work, without either a) breaking apart the current pass structure by inserting a module-level pass or b) violating the implied pass scope boundaries.

FWIW, I asked Chandler last month on IRC for this thoughts about whether the new pass manager will support running different (function, basic block, etc.) passes in parallel. There is obviously a lot of work yet before this is possible, but he said that is the eventual goal. I don't yet understand what kind of locking scheme we'd need for this in practice, but it seems that something like this might get in the way.

On the other hand, if we're okay with this, I'd like to do something similar so that static functions can determine aliasing information from their callers.

-Hal

Hi, Hal:

     Thank you so much for the informative reply and your expertise!

Shuxin

      o. The address-taken information collected before LTO (aka pre-IPO),
         cannot be fed to LTO, unless we re-analyze them from ground up.

This is the one case I can think an attribute similar to this could
make a reasonable difference. Not because we get variables with too
many uses, but because we have to read in all the IL to find the uses.

I say similar because the semantics would have to be a bit different.
What we would want is basically a conservative cache of
GlobalStatus.IsCompared:

* If a GlobalValue has its address used in this TU, this flag must be clear.
* If a GlobalValue's address is not used in this TU, this flag may be set.

Note in particular that this is a per TU property. This means that its
users have to check the linkage (as they do when using GlobalStatus)
and the linker has to learn how merge them.

In summary, it is strictly just a compile time optimization.

Cheers,
Rafael