"generic" address space

Forgot to send it to the entire list.

Hi,

I took a look at what __buildin_shufflevector does. In that case, it enforces more rules than the type system allow after checking the call. In that vein, what I could do instead of using this weird type is in ActOnCallExpr,when checking the arguments of an intrinsic call, instead of calling
CheckSingleAssignmentConstraints

I could call my own version, e.g., CheckBuiltinAssignmentConstraints that will allow mismatches between address spaces to go through. How does that sound?

– Mon Ping

What we do for __builtin_shufflevector does is claim the argument list
is completely variable, then check the actual arguments from
CheckFunctionCall. What I was thinking is that we could do the same
thing for the builtins which can take alternate address spaces, i.e.
__sync_*. (Are there any other builtins for which you're planning
address space overloading?)

That said, your approach could also work, although it doesn't seem as
clean to me.

-Eli

Hi Eli,

On an architecture that support address spaces, any builtins that touches memory would probably need this support. Today, I think it's mostly the sync functions. For a specific architecture, one is likely to add architecture specific built-ins to support some other operations that touches memory and I would like to use the same mechanism (e.g., specialized load/store instructions) to do the check. I rather not specifying that the argument list is completely variable because I think that means that for any such a built-in, we would need to teach clang about that operation and signature. I would prefer to teach clang that some intrinsics can take pointer to any address space but to a specific domain type and get that information from the signature of the intrinsics. I would like to keep clang as clean as possible. The only other possibility that I have though of other than loosing the parameter-argument constraint or the weird type is to generate a different built-in signature for the pointer type base on the address space of the use. I didn't like that choice as it seemed dirty without much benefit.

   -- Mon Ping

Ah, I see... I hadn't really considered the arch-specific builtins
part. In that case, adding some generic machinery might be
appropriate. That said, I've never actually seen an architecture with
multiple address spaces, so pointers to any documentation would be
helpful.

I don't think we should be abusing existing machinery to handle this
case; therefore, I think there should be a new kind of AST expression
for address-space-overloaded intrinsics. This makes it explicit that
there is something unusual going on with the types for such intrinsic
calls.

-Eli

Hi Eli,

Whatever we do, it should be clear when examining the AST that we are matching an address space overloaded intrinsic. I’m not sure what new AST expression tree would make sense. I view it that we have an call expression to an address space overloaded intrinsic function. When we check the argument and parameter of the signature, the normal rules follow except for parameters that can accept different address space. During code generation, the different argument type will generate a slightly different intrinsic function like @llvm.atomic.load.add.i32.p0i32.

It is a bit unclear to me what happens with intrinsics that can take any integer width. Looking briefily, it looks like clang will generate an implicit cast for the argument to integer. Do we always generate the 32 bit integer version of these intrinsics?

Various DSPs (e.g TI TMS320C30), embedded architectures and GPUs support different address spaces. The Cell processor has local memory accessible only by the SPUs. Nvidia and AMD GPUs typically have different address spaces for things like local or textures. It is difficult to get their ISAs though AMD has released their R600 ISA (http://www.x.org/docs/AMD/r600isa.pdf). I’ll have to look to see what is available publicly.

– Mon Ping

Hi Eli,
Whatever we do, it should be clear when examining the AST that we are
matching an address space overloaded intrinsic. I'm not sure what new AST
expression tree would make sense. I view it that we have an call expression
to an address space overloaded intrinsic function. When we check the
argument and parameter of the signature, the normal rules follow except for
parameters that can accept different address space.

The issue is that the types of the function arguments won't match the
type of the call, which would be surprising for most AST consumers.

It is a bit unclear to me what happens with intrinsics that can take any
integer width. Looking briefily, it looks like clang will generate an
implicit cast for the argument to integer. Do we always generate the 32 bit
integer version of these intrinsics?

Mmmm, it looks like clang's versions of the __sync_* builtins are
broken in that respect. I think some custom logic is needed for the
overloading.

Various DSPs (e.g TI TMS320C30), embedded architectures and GPUs support
different address spaces. The Cell processor has local
memory accessible only by the SPUs. Nvidia and AMD GPUs typically have
different address spaces for things like local or textures. It is difficult
to get their ISAs though AMD has released their R600 ISA
(http://www.x.org/docs/AMD/r600isa.pdf). I'll have to look to see what is
available publicly.

Hmm, okay... I'm trying to get a feel for what sorts of intrinsics for
alternate address spaces we'll want, and for which ones it's useful to
overload the address space of the pointer arguments. For example, for
the R600 ISA (which I skimmed), I can't think of any intrinsics for
which overloading would be useful; it seems like the address spaces
are implied by the instruction. Do you have any examples where it
would be useful to overload the address space of platform-specific
intrinsics?

-Eli

I don't think we have any of these. However, yes by default that would happen. Sema could choose to intercept these before the implicit casts are inserted, or remove implicit casts later.

-Chris

Hi,

For these type of overloaded intrinsics, I think Sema should choose to intercept these before the implicit casts are inserted as they will alter which intrinsic would be called in code generation. Whatever solution we come up with the address space might also apply to this case as well.

   -- Mon Ping

Hi,

I agree with you that AST should be as strongly typed as possible and be consistent. (BTW, does clang handle K&R C and formal conversions when one doesn’t see the prototype, i.e., the automatic promotion to double and then converting it back to float). If we want to make it very clear on the AST, we could create a special operator that indicates that a address mismatch is allowed. I’m not sure what to call it but it is more of an type adaptor than a conversion. This would make it very clear at the AST what is going on without resorting to a weird placeholder type. Another possibility is that we could tag the ParmVarDecl to indicate that this parameter can ignore any address space differences between the argument and the parameter.

– Mon Ping

(BTW, does clang handle K&R C and formal conversions when one
doesn't see the prototype, i.e., the automatic promotion to double and then
converting it back to float).

clang should handle everything related to promotions and functions
without prototypes correctly; please file a bug if you find any
issues.

If we want to make it very clear on the AST,
we could create a special operator that indicates that a address mismatch is
allowed. I'm not sure what to call it but it is more of an type adaptor
than a conversion. This would make it very clear at the AST what is going
on without resorting to a weird placeholder type. Another possibility is
that we could tag the ParmVarDecl to indicate that this parameter can ignore
any address space differences between the argument and the parameter.

All possibilities, I guess... I'm not particularly fond of any of
them. I still think adding a new type of AST expression instead of
the call would be better. These builtins aren't really functions at
all, but overloaded function-style operators, and overloading the
semantics with non-calls seems messy.

I suppose another possibility (which I think you mentioned in passing)
would be to synthesize decls for the various legal overloads on
demand. The synthesis step is a bit messy, but the abstraction is
otherwise quite clean.

-Eli

By a new expr tree, do you mean that instead of creating a CallExpr, we create another node type like IntrinsicExpr and these have slightly different rules for processing? During CodeGen, both of these nodes will be process the same. If that is the case, I don't see much of a problem going this way.

At the time when I first thought about synthesizing the prototype, my concern was that we will be spending some extra time to create the prototype and do some redundant work because after we create the prototype, we would check it again since we will pass it to normal processing. However, we already to create the intrinsic on the fly and when we create a custom prototype based on some call site information, we could skip the regular checks and conversions since we would have already done this. At the abstraction level, it does seem the cleanest way to go since arguments would match correctly.

  -- Mon Ping

By a new expr tree, do you mean that instead of creating a CallExpr, we
create another node type like IntrinsicExpr and these have slightly
different rules for processing?

Yes, that's the idea.

During CodeGen, both of these nodes will be
process the same. If that is the case, I don't see much of a problem going
this way.

At the time when I first thought about synthesizing the prototype, my
concern was that we will be spending some extra time to create the prototype
and do some redundant work because after we create the prototype, we would
check it again since we will pass it to normal processing. However, we
already to create the intrinsic on the fly and when we create a custom
prototype based on some call site information, we could skip the regular
checks and conversions since we would have already done this. At the
abstraction level, it does seem the cleanest way to go since arguments would
match correctly.

Okay; I don't have a strong preference between the two approaches.

-Eli