RFC: Support for off-heap definitions in WebAssembly target

Hello list,

This is a request for comments regarding a proposal to support off-heap
definitions on the WebAssembly target. This will allow some global and
local variables to be given addresses that are not in main memory and
not in registers -- instead they are in named locations managed by the
WebAssembly run-time.

Concretely, this proposal is to recognize a new type qualifier to be
used for these definitions, to enforce ACLE-like restrictions on the use
of these types, to use a specific LangAS for these definitions, and to
allow a target to put an alloca in a different LangAS depending on the

# Meta

For background, I have been working on the WebAssembly target for a year
or so, with a goal of adding support for the "reference types" target
feature, climbing up from MC to IR and now to clang. Cc'ing a couple of
fellow travellers, though the mistakes and boneheadedness in this
proposal are all mine.

An older LLVM RFC is here:
https://groups.google.com/g/llvm-dev/c/H2TXl7Q_3UE. Since then, the
needed pieces have landed in LLVM.

# Motivation

The WebAssembly target usually compiles global and local variables
definitions to memory allocations. Unless optimized away, a global
variable denotes an address in memory, in the same address space shared
by other static data, the heap, and the stack. Similarly, a local
variable starts life as an alloca, and unless lifted to an SSA value by
SROA (as is usually the case), a local variable definition will
eventually lower to an SP-relative address on the stack.

However, the WebAssembly target also supports named definitions that are
not part of main memory. A WebAssembly module can contain named global
variables, and each function can have named local variables. These
variables are typed and are accessed only by immediate index; they can't
be addressed by a pointer.

It would be nice if we could support these definitions in C. For our
target this has two main advantages:

1. It allows us to make definitions that can't be accessed accidentally
    at run-time via forging a pointer, as global.ref / global.set /
    local.ref / local.set refer to their operand only via an immediate

2. It would allow us to have global and local variables of types that
    can't be written in main memory. WebAssembly has two kinds of types:
    [number types and reference
    types](Types — WebAssembly 2.0 (Draft 2022-08-23)).
    The former can be written to main memory. The latter cannot, as
    their representation is opaque. Reference types are used notably as
    a way to represent values managed the "host" for the WebAssembly
    program, e.g. JavaScript objects in a web browser.

The long-range vision is to allow C/C++ global and local variable
definitions of reference-typed values. As these definitions can't be
addressed by pointers in memory, we'd apply similar restrictions as
those that the ARM C Language Extensions (ACLE) applies to SVE values in
the front-end. We are not yet ready to post an RFC for a full language
design, but for those interested, we do have an [early draft design
document](Reftypes in Clang - Google Docs).

# Lowering mechanism

The way this works on the IR level is that these off-heap definitions
are allocated with an address space that is well-known to the
WebAssembly target. This address space is non-integral, to prevent
pointer optimization. Instruction selection lowers definitions in this
address space to the appropriate low-level code. Allocas in these
address spaces are all static, and are filtered out as part of the frame

Note that this is a generic mechanism for any type supported by
WebAssembly. You can have an off-heap global (e.g.), whose type is i32,
which has the property that you know that no pointer can alias the value
unless it is a direct symbolic reference to the definition.

However the utility comes in with regards to reference-typed values,
which can't be written to memory at all. There are some assumptions
currently in clang that any value can be written to memory, which we
would need to relax if we are to add support for this new kind of value.

# Proposal for clang (comments much welcome & appreciated!)

Well, it's cheeky to be here and propose a thing for The Frontend For
The Important Languages. However it is a useful target feature that
would be very valuable to expose, and I see two precedents:

- OpenCL and some GPU targets allow definitions to exist in different
   address spaces.

- SVE values. As instances of these types don't have a size that is
   known at compile time, the ACLE defines some semantic restrictions on
   their use, which clang applies. The restrictions that we need to
   apply are essentially the same, except that additionally we can't
   take the address of an instance of this new kind of definition.

Therefore, to support ongoing experimentation, I propose to implement
basic support for off-heap values in clang, specific to the WebAssembly
target, via the following steps:

1. Add support for QualType-specific definition address spaces (NFC).
    This would apply a change like this:

    - /// Get the AST address space for alloca.
    - virtual LangAS getASTAllocaAddressSpace() const { return LangAS::Default; }
    + /// Get the AST address space for alloca of type \p QT.
    + virtual LangAS getASTAllocaAddressSpace(const CodeGen::CodeGenModule &CGM,
    + QualType QT) const {
    + return LangAS::Default;
    + }

    For the WebAssembly target, in a next step, we might return a
    different AS. However we also need some target-specific logic
    regarding address space casts and bitcasts, to allow the WebAssembly
    target to leave these definitions in their alloca address space
    rather than always casting to LangAS::Default.

    Similar concerns apply to global definitions. At this stage I am
    only concerned with definitions with static or automatic storage

2. Define a new LangAS for WebAssembly off-heap definitions. Extend
    the WebAssembly target with an attribute that can be applied to any
    type, possibly spelled __attribute__((wasm_var)). The WebAssembly
    target would then assign definitions of this type to the new LangAS
    address space.

3. Extend ACLE SVE Sema restrictions to values in the wasm_var LangAS.
    Add a restriction to make address-of signal a compile-time error,
    for wasm_var values.

4. Add new builtin types for the primitive WebAssembly reference types
    __externref_t and __funcref_t. These are a kind of "void*" but for
    host-managed values. These builtin types would already carry the
    __attribute__((wasm_var)) qualifier, so that the ACLE-like
    restrictions would apply to their use.

At that point, we would have enough to proceed and define the various
builtins that deal in reference types, as well as the other uses.

# Review

These changes are intended to be entirely NFC on targets other than

I propose to slightly generalize the code that lowers AST to IR as
regards address spaces for alloca and globals, to allow target-specific
LangAS choice.

I also propose to extend some predicates in Sema/ to apple ACLE
restrictions to wasm_var values, and also add another new restriction

Is any of this a no-go? Is there a better way to do this? Feedback is
greatly appreciated. I have some draft code but thought that an
overview might help me identify problems earlier in the process.