Casting between address spaces and address space semantics

Hi all,

I'm currently struggling a bit with some problems regarding address spaces and
(implicit) casts. I'll explain some context first and then proceed to the
actual question I'd like to have answered.

In our target platform, we have a number of distinctly different memory banks.
To access these from our C code, we declare a global array for each memory,
with the address space attribute to mark in which memory the array should be
allocated. This allows the backend to map the load and store instructions from
and to this array on the right target instructions, depending on the memory
used. For example:

__attribute__((address_space(1))) char mem1[100];
__attribute__((address_space(2))) char mem2[100];

Now, we are using a function which reads a value from one of these memories
and does some processing. Since we want to execute this function for multiple
memories, we make it accept a pointer in the generic address space (ie, no
address space attribute):
  
  void do_stuff(char* mem);

Somewhere later on, we call the function as follows:

  do_stuff(mem1);
  ...
  do_stuff(mem2);

As expected, the LLVM IR resulting from this contains a bitcast to remove the
address space from mem1 and mem2 (and also cast from [100 x i8] to i8*, but
that is less interesting). Now, this bitcast really isn't supported by
our backend (more specifically, the do_stuff function can't be codegen'd since
we need to know from which address space we are reading at copile time).

To solve this, we make sure the do_stuff function gets inlined. When this
happens, our problems should go away, since we now know at compile time which
address space is used for every load and store instruction. However, the
bitcast that removes the address space won't go away.

What I would like to see, is that the bitcast be removed and the addrspace
annotated type be propagated to the gep/load/store instructions that use it.

However, this brings me to my actual question: How are address spaces
semantically defined? I see two options here:

a) Every address space has the full range of addresses and they completely
live side by side. This means that, for example, i32 addrspace(1) * 100 points
to a different piece of memory than i32 addrspace(2) * 100. Also, this means
that a bitcast from one address space to another (possibly 0), makes the
pointer point to something different when loaded.

b) Every address space is really a subspace of the full range of addresses,
but always disjoint. This means that, for example, i32 addrspace(1) * 100
points to the same memory as i32 addrspace(2) * 100, though one, or possibly
both of them can be invalid (since the pointer lies outside of that address
space). This also means that bitcasting a pointer from one address space to
another doesn't change it's meaning, though it can potentially become invalid.

This approach also allows addrspace(0) to become a bit more special: Any
pointer can be valid in that addrspace. This means that casting to
addrspace(0) and then loading is semantically the same as loading directly
(though it can be that the hardware can't map something like that). This
approach would also allow for removing the bitcast in my above problem, since
the loads from the generic address space can be replaced by loads from a
specific address space, without changing semantics. This does not currently
happen, but I would be willing to implement this. Finally, this approach is
consistent with the address spaces defined by DSP-C.

There are probably a few more approaches, that are similar to the ones above.
I would suggest that whatever approach we pick, we should document it
somewhere (langref?), since leaving this stuff backend-specific makes
transformations not so useful. Being consistent with DSP-C is probably the
best track here?

Gr.

Matthijs

Now, we are using a function which reads a value from one of these memories
and does some processing. Since we want to execute this function for multiple
memories, we make it accept a pointer in the generic address space (ie, no
address space attribute):

       void do_stuff(char* mem);

The "generic address space" you're referring to is really just address
space 0, at least in the current implementation. Assuming alternate
address spaces are actually separate, passing a pointer from a
different address space to this function is undefined behavior.

However, this brings me to my actual question: How are address spaces
semantically defined? I see two options here:

a) Every address space has the full range of addresses and they completely
live side by side. This means that, for example, i32 addrspace(1) * 100 points
to a different piece of memory than i32 addrspace(2) * 100. Also, this means
that a bitcast from one address space to another (possibly 0), makes the
pointer point to something different when loaded.

b) Every address space is really a subspace of the full range of addresses,
but always disjoint. This means that, for example, i32 addrspace(1) * 100
points to the same memory as i32 addrspace(2) * 100, though one, or possibly
both of them can be invalid (since the pointer lies outside of that address
space). This also means that bitcasting a pointer from one address space to
another doesn't change it's meaning, though it can potentially become invalid.

Address spaces are platform-specific, so you can define them any way
you want, I suppose. But if you can map the alternate address spaces
into address space 0, is there really any point to keeping the other
address spaces around?

-Eli

In ISO/IEC WG14 n1169 on the C extensions to support embedded processors, any two address spaces must be disjoint, must be equivalent, or must be nested. As Eli indicated, the actual relationship is platform specific depending on what makes the most sense for your hardware and how the program will behave will depend on that relationship.

   -- Mon Ping

Hi Eli, Mon Ping,

In ISO/IEC WG14 n1169 on the C extensions to support embedded
processors, any two address spaces must be disjoint, must be
equivalent, or must be nested.

Ah, that standard is a lot clearer on this subject than the DSP-C one I read
was.

As Eli indicated, the actual relationship is platform specific depending on
what makes the most sense for your hardware and how the program will behave
will depend on that relationship.

If I read the standard correctly, the properties of these address spaces can
be fully captured by defining the relationship between every pair of address
spaces (disjoint, identical, subset/superset).

I think it would make sense to make these relationships backend/platform
specific, but for clang and the optimization passes to properly work with
address spaces, these relationships should be known. It seems to me that the
proper way to handle this, is to include this information in the TargetData
class (or whatever platform-specific class is appropriate, I'm not really into
this currently).

When this information is known, we can define a "any memory" address space,
which is a superset of the specific address spaces. This will allow a function
to be (temporarily) defined to work with any memory, while still properly
propagating information about the different memories through the program (and
preventing mixup, since you can cast to the "any memory" address space, but
not back).

This requires clang to know about the relationships between address spaces, so
it knows when to insert implicit casts and when to error out.

This also requires the transformation passes to know about these
relationships, so they know when a bitcast can be removed and types
propagated.

So, is it a plan to make these relationships available to the
frontend/transformations in some way? If so, in what way?

Gr.

Matthijs

Hi,

When this information is known, we can define a "any memory" address
space, which is a superset of the specific address spaces. This will

The two uses I see for memory space have completely separate memory space
(even different instruction to access). You can't define an 'any memory
space', if your memory space is already mapped on a global memory, why
bother to use memory space anyways...

[ the two use cas I see are:
- on small µC, you have acces to the RAM and to the program Flash as a ROM
and sometimes to EEPROM, each with there own address (not event all the same
size) and own instructions for load/store.
- on one embedded µC I use, the I/O port isn't mapped in the RAM space and
use different load/store instructions as RAM
]

allow a function to be (temporarily) defined to work with any memory,
while still properly propagating information about the different
memories through the program (and preventing mixup, since you can cast
to the "any memory" address space, but not back).

I think that what you want is a basic form of C++ template allowing a
function to take any address space. Like template, these function would be
codegened as many time as there is of different (used) combination of
address space in there parameter (since load and store are different for
each address space).

Regards,

Hi all,

If I read the standard correctly, the properties of these address spaces can
be fully captured by defining the relationship between every pair of address
spaces (disjoint, identical, subset/superset).

I think it would make sense to make these relationships backend/platform
specific, but for clang and the optimization passes to properly work with
address spaces, these relationships should be known. It seems to me that the
proper way to handle this, is to include this information in the TargetData
class (or whatever platform-specific class is appropriate, I'm not really into
this currently).

To have something to actually talk about, I thought I'd create some code. I've
got a patch attached which consists roughly of two parts: it adds the concept
of address space relations to TargetData, and teaches InstructionCombining to
remove casts based on those relations.

The instcombine is pretty straightforward: If a bitcast changes the address
space of a pointer in a way that is redundant, and all of the users of the
bitcasts can be rewritten to use the original pointer as well (currently this
means loads, stores and geps only), this is done.

The TargetData part is a lot more tricky. I originally planned to simply add a
virtual method to TargetData and create a specific subclass for our compiler,
when I discovered that this is not how TargetData is supposed to work.
Currently, TargetData consists only of a TargetDescription, which is basically
only sizes and alignment info for different types.

I solved this by creating a new class AddrspacesDescriptor, which has a single
virtual method that defines the relationship between address spaces. This
class contains a default implementation and can be subclassed to get a
different implementation.

A reference to a AddrspaceDescriptor is passed to TargetData's constructor,
with a default. This means that all of the current code uses the default
implementation. I'm not so sure what would be the correct way to make the
various tool select other configurations, currently.

Also, I'm quite positive that TargetData is not really the right place to put
something like this. At least not in the way it is currently written, passing
around references to AddrspacesDescriptor is far from ideal. However, I do not
think we should be trying to define these relations between address spaces in
a text-based format similar to the current TargetDescription. In that way, we
would either loose a lot of flexibility, or end up with a huge
TargetDescription (with one element for each combination of address
spaces...).

So, any comments on the patch and/or hints on how to do things better? Any
thoughts on how selecting a address spaces configuration should work in the
different tools?

Gr.

Matthijs

Index: lib/Transforms/Scalar/InstructionCombining.cpp

Hi Matthijs,

I was going to check in a change into clang assuming that by default, address spaces are separate unless there was some additional information (which since we didn't have any was always true) so that implicit cast will always generate an error. I'll hold off on this until we resolved what we want to do here.

   -- Mon Ping

Hi Matthijs,

Thanks for giving some code so we can discuss this in more concrete detail. In terms of the information we need, I think you have it right. We just need a description of how the different address spaces relate and I don't see much of an issue with how you implemented to InstructionCombining.

As you also mentioned, I don't like that we pass a reference to TargetData with the AddrspacesDescriptor and that we have a static default implementation store in the class. It is unclear to me if this is the right class for it as not as it describes the structure and alignment while address spaces seems distinct from that. I feel that this information should be part of the TargetMachine as the address spaces description is part of a target machine but I may be wrong here so if someone has a different opinion here, please chime in.

In terms of text based or not, I don't have a strong opinion. I don't see a problem with a text based format as these relationships are pretty simple since there can't be weird overlaps. With only the choice of disjoint, subset, and equivalent, the description shouldn't be too long and the tool can compute the transitive closure of the description and generate a reasonable efficient table for lookup. I don't see a problem with defining one's own class to store this information either.

As an aside, I think that the default relationship for two different address spaces should be disjoint. We should not encourage people to define equivalent address spaces as in general, I don't see why someone would want to do that except in rare circumstances.

   -- Mon Ping

Hi Mon Ping,

As you also mentioned, I don't like that we pass a reference to
TargetData with the AddrspacesDescriptor and that we have a static
default implementation store in the class. It is unclear to me if
this is the right class for it as not as it describes the structure
and alignment while address spaces seems distinct from that. I feel
that this information should be part of the TargetMachine as the
address spaces description is part of a target machine but I may be
wrong here so if someone has a different opinion here, please chime in.

From the looks of it, TargetMachine was only used in the codegenerator, and
the other transformation passes can't access it. Or is that a wrong
observation on my side?

Anyway, since this information is needed pretty much over the entire field
(ie, even clang should have it to know whether an implicit should be generated
or cause an error), TargetMachine might be way too low level. TargetData seems
okay, but might need a bit of a redesign.

Perhaps some way of coding these relations into the LLVM IR would be needed as
well? Or perhaps we could load the right set of relations based on the target
triple?

In terms of text based or not, I don't have a strong opinion. I don't
see a problem with a text based format as these relationships are
pretty simple since there can't be weird overlaps. With only the
choice of disjoint, subset, and equivalent, the description shouldn't
be too long and the tool can compute the transitive closure of the
description and generate a reasonable efficient table for lookup.

Hmm, you have a good point there. Transitivity should greatly limit the number
of possible options.

I don't see a problem with defining one's own class to store this
information either.

Let's see about this later on then :slight_smile:

As an aside, I think that the default relationship for two different
address spaces should be disjoint. We should not encourage people to
define equivalent address spaces as in general, I don't see why someone
would want to do that except in rare circumstances.

Yeah, that made more sense to me as well. However, I set it to Equivalent for
now, to be compatible with what clang generates. If we make clang listen to
this code as well, changing the default to Disjoint makes a lot of sense.

So, can anyone shed some light on the TargetData/Machine/Whatever issue?

Gr.

Matthijs

TargetData is a concrete class, part of the middle-end. It describes parts of the the target's ABI. It allows target-independent passes to do things like decide whether x[1] and ((char *) x)[4] alias. This is possible without actually linking with the code generator for the relevant target. TargetData can be encoded in the LLVM IR, e.g. target datalayout = "...........".

TargetMachine is an abstract factory class used to gain access to all features of the code generator.

There is not a 1:1 mapping between TargetData and TargetMachine; Mac, Linux, and Windows all have different data layouts, but use the same X86 TargetMachine.

You probably want to somehow extend TargetData to encode the address space descriptions, including pointer sizes and address space relationships.

— Gordon

Hi Matthijs,

As you also mentioned, I don’t like that we pass a reference to

TargetData with the AddrspacesDescriptor and that we have a static

default implementation store in the class. It is unclear to me if

this is the right class for it as not as it describes the structure

and alignment while address spaces seems distinct from that. I feel

that this information should be part of the TargetMachine as the

address spaces description is part of a target machine but I may be

wrong here so if someone has a different opinion here, please chime in.

From the looks of it, TargetMachine was only used in the codegenerator, and
the other transformation passes can’t access it. Or is that a wrong
observation on my side?

Anyway, since this information is needed pretty much over the entire field
(ie, even clang should have it to know whether an implicit should be generated
or cause an error), TargetMachine might be way too low level. TargetData seems
okay, but might need a bit of a redesign.

Perhaps some way of coding these relations into the LLVM IR would be needed as
well? Or perhaps we could load the right set of relations based on the target
triple?

At some level, I think we are hitting the design question about how can some transformation get access to target dependent information for a transformation phase. As we get more phases, this might become more of an issue. For example, assuming one wants to block (tile) a loop based on the cache size of the machine. The loop transformation phase will need access to target dependent information. As you noted, TargetMachine is used by CodeGen as well as some utility like llc. Assuming TargetMachine is the right location for this information, I don’t see a problem if a phase needs to get access to the object.

In terms of encoding this information in the LLVM IR, I rather not unless there is a real need. The target we are compiling for should have this information. If we start encoding the address spaces, do we also need to encode other information like stack layout etc?

[deleted text]

As an aside, I think that the default relationship for two different

address spaces should be disjoint. We should not encourage people to

define equivalent address spaces as in general, I don’t see why someone

would want to do that except in rare circumstances.

Yeah, that made more sense to me as well. However, I set it to Equivalent for
now, to be compatible with what clang generates. If we make clang listen to
this code as well, changing the default to Disjoint makes a lot of sense.

I was going to change the default in clang to make implicit casting of between address spaces illegal. Having this information will supersede that change as we will have the correct information to do the right thing based on the target.

Best regards,
– Mon Ping

Hi Gordon,

TargetData is a concrete class, part of the middle-end. It describes
parts of the the target's ABI. It allows target-independent passes to
do things like decide whether x[1] and ((char *) x)[4] alias. This is
possible without actually linking with the code generator for the
relevant target. TargetData can be encoded in the LLVM IR, e.g. target
datalayout = "...........".

TargetMachine is an abstract factory class used to gain access to all
features of the code generator.

There is not a 1:1 mapping between TargetData and TargetMachine; Mac,
Linux, and Windows all have different data layouts, but use the same
X86 TargetMachine.

Ok, that makes sense.

You probably want to somehow extend TargetData to encode the address
space descriptions, including pointer sizes and address space
relationships.

Hmm, you have an excellent point there, pointer sizes could indeed easily have
different sizes in different address spaces... That would make TargetData
indeed a logical place to put such info.

I'm not completely sure how a bitcast from one pointer type to another would
work then, it would probably need zext in addition to casting?

In this case, finding some text representation for the address space relations
make sense, so that these relations can be embedded into the LLVM IR (possibly
inside the "target datalayout" entry, or perhaps in a separate "target
addrspaces" entry?

This does makes you wonder what should happen when two modules with different
addrspace configurations are linked together. How is this currently handled
with datalayouts?

Gr.

Matthijs

You probably want to somehow extend TargetData to encode the address
space descriptions, including pointer sizes and address space
relationships.

Hmm, you have an excellent point there, pointer sizes could indeed easily have
different sizes in different address spaces... That would make TargetData
indeed a logical place to put such info.

Yep, I think it makes sense for TargetData to have info about the size/alignment of a pointer in each addr space. That is also easy to encode.

I'm not completely sure how a bitcast from one pointer type to another would
work then, it would probably need zext in addition to casting?

In the code generator, it can know a lot more information about what happens when bitcasting from one address space to another. This doesn't have to go into TargetData.

In this case, finding some text representation for the address space relations
make sense, so that these relations can be embedded into the LLVM IR (possibly
inside the "target datalayout" entry, or perhaps in a separate "target
addrspaces" entry?

Are you envisioning an LLVM IR pass that mucks around with these? When would that be useful?

This does makes you wonder what should happen when two modules with different
addrspace configurations are linked together. How is this currently handled
with datalayouts?

You can see this in Linker::LinkModules, basically one randomly overrides the other unless one modules is empty, in which case the non-empty one wins.

-Chris

Hi Chris,

>> You probably want to somehow extend TargetData to encode the address
>> space descriptions, including pointer sizes and address space
>> relationships.
> Hmm, you have an excellent point there, pointer sizes could indeed easily
> have different sizes in different address spaces... That would make
> TargetData indeed a logical place to put such info.

Yep, I think it makes sense for TargetData to have info about the size/
alignment of a pointer in each addr space. That is also easy to encode.

Which is an added bonus, but the original subject under discussion was the
relations between each address space (equivalent, disjoint, subset/superset).
Any thoughts on that? Should they also belong in TargetData?

> I'm not completely sure how a bitcast from one pointer type to another
> would work then, it would probably need zext in addition to casting?

In the code generator, it can know a lot more information about what
happens when bitcasting from one address space to another. This
doesn't have to go into TargetData.

Well, if the pointers really are differently sized, you couldn't just bitcast
between them, since bitcast requires values of equal bitlength? Or is this
something that would only be exposed when doing intoptr/ptrtoint and simply
assumed to be handled in the backend for pointer-to-pointer bitcasts?

> In this case, finding some text representation for the address space
> relations make sense, so that these relations can be embedded into the
> LLVM IR (possibly inside the "target datalayout" entry, or perhaps in a
> separate "target addrspaces" entry?

Are you envisioning an LLVM IR pass that mucks around with these?
When would that be useful?

Not a particular pass, but passes in general. Specifically, I would like
instcombining to be able to use this info to remove useless bitcasts. Also,
this information is useful for clang to know when inserting an implicit cast
makes sense and when it would be an error.

You can see this in Linker::LinkModules, basically one randomly overrides
the other unless one modules is empty, in which case the non- empty one
wins.

Shouldn't this give a warning instead of silently picking one? Also, couldn't
this produces an invalid module? Say, for example that one module has 16 bit
pointers and the other 32 bit. If we use 32 bit as the resulting pointer size,
but the other module contained something like
  %int = ptrtoint i32* %A to i16

This could suddenly cause truncation of the pointer, where it didn't in the
original module. Warning for this seems appropriate? The same would go for
different address space configurations, I expect mixing those could cause
nasty, but probabl very subtle bugs...

Gr.

Matthijs

Yep, I think it makes sense for TargetData to have info about the size/
alignment of a pointer in each addr space. That is also easy to encode.

Which is an added bonus, but the original subject under discussion was the
relations between each address space (equivalent, disjoint, subset/superset).
Any thoughts on that? Should they also belong in TargetData?

Only if absolutely required, see below.

I'm not completely sure how a bitcast from one pointer type to another
would work then, it would probably need zext in addition to casting?

In the code generator, it can know a lot more information about what
happens when bitcasting from one address space to another. This
doesn't have to go into TargetData.

Well, if the pointers really are differently sized, you couldn't just bitcast
between them, since bitcast requires values of equal bitlength? Or is this
something that would only be exposed when doing intoptr/ptrtoint and simply
assumed to be handled in the backend for pointer-to-pointer bitcasts?

I think it should be safe for instcombine to assume there is no truncation of pointers. If you have ptra -> ptrb -> ptra, it is theoretically possible that ptrb is smaller than ptra pointers. However, this is a form of pointer overflow, which llvm treats as undefined already. Further, it isn't clear to me that this is going to be common enough for instcombine to really worry about. Only handling it in the code generator seems fine to me, but maybe you have a different use case for addrspaces in mind.

In this case, finding some text representation for the address space
relations make sense, so that these relations can be embedded into the
LLVM IR (possibly inside the "target datalayout" entry, or perhaps in a
separate "target addrspaces" entry?

Are you envisioning an LLVM IR pass that mucks around with these?
When would that be useful?

Not a particular pass, but passes in general. Specifically, I would like
instcombining to be able to use this info to remove useless bitcasts. Also,
this information is useful for clang to know when inserting an implicit cast
makes sense and when it would be an error.

Clang should just reject implicit casts in *any* case when the addr spaces differ IMO. In any case, clang has it's own parameterization of the target which is richer than TargetData already, so if it really wanted to allow implicit casts between address space subsets, it could do so based on its own info.

You can see this in Linker::LinkModules, basically one randomly overrides
the other unless one modules is empty, in which case the non- empty one
wins.

Shouldn't this give a warning instead of silently picking one?

What would the warning say? There is nothing useful the linker could tell the user, and having a low level tool like the llvm linker library emit warnings (where? to stderr?) is extremely impolite for some clients. If a specific client cares about this, it should check and report in its own way.

-Chris

Hi all,

I've been a tad busy in the last few weeks, but I don't think this issue is
settled yet.

> the relations between each address space (equivalent, disjoint, subset/
> superset).
> Any thoughts on that? Should they also belong in TargetData?
Only if absolutely required, see below.

Is there any other place they would fit in better? What piece of below were
you referring to exactly? Would it be useful to have a class that works just
like TargetData does? Ie, an immutable pass that can be added to the
passmanager in addition to TargetData? Anyway, let's see what the result of
the below discussion is first, I guess.

> Specifically, I would like instcombining to be able to use this info to
> remove useless bitcasts. Also, this information is useful for clang to
> know when inserting an implicit cast makes sense and when it would be an
> error.
Clang should just reject implicit casts in *any* case when the addr
spaces differ IMO. In any case, clang has it's own parameterization
of the target which is richer than TargetData already, so if it really
wanted to allow implicit casts between address space subsets, it could
do so based on its own info.

I think that adding this info to Clang's targetinfo code would indeed make
sense, though perhaps disallowing implicit casts entirely is also acceptable
(provided that you can still do explicit casts).

However, what I find more important (this discussion is greatly helping me in
finding out what exactly it is that I find important :-p) is that instcombine
(or some other pass, for all I care) can remove bitcasts that are not strictly
needed.

I will focus on address spaces that are nested, since that is the most
interesting and general case. Say we have address spaces SubA, SubB and Super, and
SubA and SubB are subsets of Super (but disjoint to each other).

When is bitcast between address spaces really needed? I can see two main
reasons.
1) A function gets passed pointers into both SubA and SubB (which then need
    to be bitcasted to pointer-into-Super).
2) A pointer variable is assigned pointers into both SubA and SubB. In LLVM
    this would probably mean a phi node that joins pointers into different
    address spaces (which is not possible, hence a bitcast is needed and the
    phi node would be of type pointer-into-super).

However, after some transformations (mainly code duplication AFAICS), the
above situations might get removed, making the bitcasts redundant. The patch I
attached a few posts back implements this by removing the bitcast when
possible. This does mean that any users of the bitcast now have one of their
operands changed (the address space gets changed).

Take for example a load instruction, that directly uses a bitcasted pointer.

When removing the bitcast, the load instruction can be replaced by a new one
that takes a pointer into another (smaller) address space. The result of the
load instruction will still be the same value, but a bitcast was removed (and
the load instruction was possibly improved as well).

Let's look at two use cases I can see for nested address spaces.

1) Different address spaces translate to different accessing instructions at
    code generation time. Instructions that access smaller address spaces
    (SubA, SubB) can execute faster than instructions that access the Super
    address space. Because of this, removing unneeded bitcasts to Super (and
    then replacing, for example, load instructions with their SubA or SubB
    variants) can result in faster code. This is currently not our use case,
    but depending on where our hardware design goes might become so.
2) Having the Super address space as an address space that is usable in the
    input C code, but should be optimized away before it hits the backend.
    Removing those redundant bitcasts allows us to prevent a lot of code
    duplication or ugly macro use in our C code, while still ending up with
    code our backend can compile. This is our current use case.

For both these use cases, it would be needed to remove redundant bitcasts. To
be able to do that, the transformation passes (or at least the one the will do
this) needs to know the relations between the different address spaces.

At the very least, redundant bitcasts should be removed. It hacked up a patch
to make instcombine do this, but that only makes sense if removing those
bitcasts allows instcombine to do further optimizations, which again allows
bitcasts to be removed. I have the feeling that propagating address spaces
might be quite orthogonal to the other transformations instcombine does, so a
seperate pass for just this might be better.

I think the main question here is, are there any other transformations that
would need (or benefit from) information on address space relations? If not,
then it might be sufficient to pass an "AddressSpacesDescriptor" to just that
one pass.

If there are other transformations, it would be useful to have one central
place to store this information (like TargetData now).

I've only just realized that this is an essential question, so I haven't given
it too much thought yet.

In either case, we should wonder in what way this information is given to the
tools (opt/llc). If we make it dependent on the target architecture, this only
becomes an issue once we support an architecture where this is relevant.
Alternatively, we could have some way of specifying the relations on the
commandline (either directly or by choosing between a number of models). Which
of these is chosen (if any) would not be too relevant for us, since we have
our own compiler driver which can pass anything to the (instcombine or other)
pass in a hardcoded fashion.

Gr.

Matthijs

Hi Chris,

I've put this into a seperate reply, since it is really a seperate issue.

>> You can see this in Linker::LinkModules, basically one randomly
>> overrides the other unless one modules is empty, in which case the non-
>> empty one wins.
> Shouldn't this give a warning instead of silently picking one?
What would the warning say? There is nothing useful the linker could
tell the user, and having a low level tool like the llvm linker
library emit warnings (where? to stderr?) is extremely impolite for
some clients. If a specific client cares about this, it should check
and report in its own way.

The linker already reports errors through an ErrorMsg output argument on
LinkInModule, though looking at it now, the other methods of llvm::Linker
don't have any way of outputting things. Since this wouldn't really be an
error but a warning, doing this in Linker is probably too much fuss anyway.
Which would change my point slightly: Perhaps llvm-link and/or llmv-ld should
check the targetdata's and give a warning if they don't match?

Gr.

Matthijs

Hi Matthijs,

Specifically, I would like instcombining to be able to use this info to

remove useless bitcasts. Also, this information is useful for clang to

know when inserting an implicit cast makes sense and when it would be an

error.

Clang should just reject implicit casts in any case when the addr

spaces differ IMO. In any case, clang has it’s own parameterization

of the target which is richer than TargetData already, so if it really

wanted to allow implicit casts between address space subsets, it could

do so based on its own info.

I think that adding this info to Clang’s targetinfo code would indeed make
sense, though perhaps disallowing implicit casts entirely is also acceptable
(provided that you can still do explicit casts).

Like most other type qualifier, explicit casts are allowed. However, if some one uses it in a context which is illegal, the result is undefined (e.g.,., cast a pointer from one space to another one address and then uses it where the address spaces don’t overlap). The C99 extension indicates that a cast of a pointer to one address space to another is valid only if the object that pointer points to exists in both address spaces (otherwise, the result is undefined). I think we all agree that the default behavior in clang should reject implicit casts between address spaces unless clang has some additional information to know better.

However, what I find more important (this discussion is greatly helping me in
finding out what exactly it is that I find important :-p) is that instcombine
(or some other pass, for all I care) can remove bitcasts that are not strictly
needed.

I will focus on address spaces that are nested, since that is the most
interesting and general case. Say we have address spaces SubA, SubB and Super, and
SubA and SubB are subsets of Super (but disjoint to each other).

When is bitcast between address spaces really needed? I can see two main
reasons.

  1. A function gets passed pointers into both SubA and SubB (which then need
    to be bitcasted to pointer-into-Super).
  2. A pointer variable is assigned pointers into both SubA and SubB. In LLVM
    this would probably mean a phi node that joins pointers into different
    address spaces (which is not possible, hence a bitcast is needed and the
    phi node would be of type pointer-into-super).

However, after some transformations (mainly code duplication AFAICS), the
above situations might get removed, making the bitcasts redundant. The patch I
attached a few posts back implements this by removing the bitcast when
possible. This does mean that any users of the bitcast now have one of their
operands changed (the address space gets changed).

Take for example a load instruction, that directly uses a bitcasted pointer.

When removing the bitcast, the load instruction can be replaced by a new one
that takes a pointer into another (smaller) address space. The result of the
load instruction will still be the same value, but a bitcast was removed (and
the load instruction was possibly improved as well).

Let’s look at two use cases I can see for nested address spaces.

  1. Different address spaces translate to different accessing instructions at
    code generation time. Instructions that access smaller address spaces
    (SubA, SubB) can execute faster than instructions that access the Super
    address space. Because of this, removing unneeded bitcasts to Super (and
    then replacing, for example, load instructions with their SubA or SubB
    variants) can result in faster code. This is currently not our use case,
    but depending on where our hardware design goes might become so.
  1. Having the Super address space as an address space that is usable in the
    input C code, but should be optimized away before it hits the backend.
    Removing those redundant bitcasts allows us to prevent a lot of code
    duplication or ugly macro use in our C code, while still ending up with
    code our backend can compile. This is our current use case.

For both these use cases, it would be needed to remove redundant bitcasts. To
be able to do that, the transformation passes (or at least the one the will do
this) needs to know the relations between the different address spaces.

I would imagine that such a cleanup phase will probably need to track what objects a pointer can point to in some region of code so it can either remove redundant casts or add a cast to refine the information of what a pointer can point to.

At the very least, redundant bitcasts should be removed. It hacked up a patch
to make instcombine do this, but that only makes sense if removing those
bitcasts allows instcombine to do further optimizations, which again allows
bitcasts to be removed. I have the feeling that propagating address spaces
might be quite orthogonal to the other transformations instcombine does, so a
seperate pass for just this might be better.

I think the main question here is, are there any other transformations that
would need (or benefit from) information on address space relations? If not,
then it might be sufficient to pass an “AddressSpacesDescriptor” to just that
one pass.

Alias analysis could use the information to help determine if two pointer aliases. As you indicated, a cleanup phases could use the information. I don’t think most other phases care as what they want to know is what address space a pointer points to and not the relationship about the address spaces.

If there are other transformations, it would be useful to have one central
place to store this information (like TargetData now).

I’ve only just realized that this is an essential question, so I haven’t given
it too much thought yet.

In either case, we should wonder in what way this information is given to the
tools (opt/llc). If we make it dependent on the target architecture, this only
becomes an issue once we support an architecture where this is relevant.
Alternatively, we could have some way of specifying the relations on the
commandline (either directly or by choosing between a number of models). Which
of these is chosen (if any) would not be too relevant for us, since we have
our own compiler driver which can pass anything to the (instcombine or other)
pass in a hardcoded fashion.

IMO, I would think that address space relationship is based on the target architecture so it seems most natural to associate with the target.

Regards,
– Mon Ping

Hi Mon Ping,

I've again attached a patch, wich lets LLVM know about about the relations
between different address spaces. Instead of cramming this info in with
TargetData (with all kinds of unwanted side effects, especially for the IR), I
opted to create a new pass, TargetAddrspaces, which holds this information.
Just like TargetData, this can be added to the passmanager by the tool running
the passes.

Unlike TargetData, however, I wanted to be able to subclass TargetAddrspaces
to change the behaviour. To make this possible, I made TargetAddrspaces an
analysis group, with a default implementation AllDisjointAddrspaces (which, as
the name suggests, returns Disjoint for all combinations). Having a default
implementations is useful, so tools are not required to add a TargetAddrspaces
pass to a passmanager.

The last part of this patch is an addition to InstCombine to make use of this
information: It removes any bitcasts from a subset to a superset address
space. It gets at the address space information by requiring the
TargetAddrspaces analysis, which will give it the default implementation in
all current tools.

So, this requires a minimal amount of changes, the current code won't know the
difference, and in our custom frontend we provide another implementation of
TargetAddrspaces and things work very well.

There are a number of issues still, though. I'm not 100% sure that using an
Analysis pass for this is correct, but currently only analysis passes can be
grouped (though I have the suspicion that this is mainly a question of naming,
really).

Additionally, the "Writing an LLVM Pass" document states: "There must be
exactly one default implementation available at all times for an Analysis
Group to be used. Only default implementation can derive from ImmutablePass."

I'm not completely sure why there cannot be other implementations that
derive from ImmutablePass. In this case, I want to have any implementation of
TargetAddrspaces also derive from ImmutablePass, because it makes sense. In
practice, this works as well, our custom implementeation as well as the
default implementation derive from ImmutablePass and everything compiles and
runs fine. Is this perhaps an outdated statement?

Mon Ping suggests using address space information for alias analysis as well,
which seems to make sense. In effect this is a form of type-based alias
analysis, but different address spaces don't preclude pointers from being
equal. A problem here is that pointers in disjoint address spaces would be
marked as not aliasing, but when the default relation is disjoint this is not
so conservative. This might require an extra option "Unknown", which can be
used as the default instead of "Disjoint". For Unknown, any pass can do the
conservative thing.

Lastly, I'm still not so sure if InstCombine is the right place for this
simplification. This needs some more thought, but currently it is a problem
that instcombine does not process BitCastConstantExprs. I might end up writing
a seperate pass for just this.

So again, suggestions welcome.

Gr.

Matthijs

addrspaces.diff (9.81 KB)

Hi Matthijs,

Sorry for not responding earlier. I have a few comments.

Hi Mon Ping,

I’ve again attached a patch, wich lets LLVM know about about the relations
between different address spaces. Instead of cramming this info in with
TargetData (with all kinds of unwanted side effects, especially for the IR), I
opted to create a new pass, TargetAddrspaces, which holds this information.
Just like TargetData, this can be added to the passmanager by the tool running
the passes.

Unlike TargetData, however, I wanted to be able to subclass TargetAddrspaces
to change the behaviour. To make this possible, I made TargetAddrspaces an
analysis group, with a default implementation AllDisjointAddrspaces (which, as
the name suggests, returns Disjoint for all combinations). Having a default
implementations is useful, so tools are not required to add a TargetAddrspaces
pass to a passmanager.

I don’t have a problem having another class, TargetAddrSpace, to store this information. However, I don’t think it make sense being a standalone pass. Address spaces seems to part of the TargetData and it seems more natural to ask the TargetData to return the TargetAddrSpace object (similar to struct layout) to describe the relationships between address spaces. BTW, there is a comment in TargetAddrspaces.h that indicate the default is that all address spaces are equivalent. I assume you meant disjoint here.

The last part of this patch is an addition to InstCombine to make use of this
information: It removes any bitcasts from a subset to a superset address
space. It gets at the address space information by requiring the
TargetAddrspaces analysis, which will give it the default implementation in
all current tools.

For the case of a GetElementPointer, we are replacing a bitcast to a pointer, getelem with a getelem bitcast. The assumption is the latter bitcast will hopefully go away when we iterate through those uses.

So, this requires a minimal amount of changes, the current code won’t know the
difference, and in our custom frontend we provide another implementation of
TargetAddrspaces and things work very well.

There are a number of issues still, though. I’m not 100% sure that using an
Analysis pass for this is correct, but currently only analysis passes can be
grouped (though I have the suspicion that this is mainly a question of naming,
really).

Additionally, the “Writing an LLVM Pass” document states: “There must be
exactly one default implementation available at all times for an Analysis
Group to be used. Only default implementation can derive from ImmutablePass.”

I’m not completely sure why there cannot be other implementations that
derive from ImmutablePass. In this case, I want to have any implementation of
TargetAddrspaces also derive from ImmutablePass, because it makes sense. In
practice, this works as well, our custom implementeation as well as the
default implementation derive from ImmutablePass and everything compiles and
runs fine. Is this perhaps an outdated statement?

Mon Ping suggests using address space information for alias analysis as well,
which seems to make sense. In effect this is a form of type-based alias
analysis, but different address spaces don’t preclude pointers from being
equal. A problem here is that pointers in disjoint address spaces would be
marked as not aliasing, but when the default relation is disjoint this is not
so conservative. This might require an extra option “Unknown”, which can be
used as the default instead of “Disjoint”. For Unknown, any pass can do the
conservative thing.

What I’m suggesting is that Alias Analysis can be a client to where we store address space information. In the example you gave, alias analysis will examine two memory locations, ask the TargetAddressSpace what the relationship is and if it is disjoint, it will return no alias. If the address spaces are in subset relationship, the alias analysis returns maybe unless it has more information. If a client doesn’t tell the compiler the correct address space information, the client shouldn’t expect correct answers from coming out of the compiler.

Lastly, I’m still not so sure if InstCombine is the right place for this
simplification. This needs some more thought, but currently it is a problem
that instcombine does not process BitCastConstantExprs. I might end up writing
a seperate pass for just this.

I’m not sure either. At some level, what we want is to propagate the most precise address space (or restrict) information to its use. This means that ideally we would want to be able to handle copies of the value stored in some temporary and track it all the way through to it use. InstCombine will not handle this case, e.g, address space 1 is a subset of 2
int<1>* ptr = …
int<2>* ptr2 = ptr1+4
*ptr2 = …

InstCombine will not cleanup this case though a copy propagation phase could clean this up.

– Mon Ping