Summary and general background
This RFC is intended to build consensus on how WebAssembly tables are represented in C/C++, aiming to expose the underlying platform capabilities (and to be explicit, the scope of this functionality is limited to the WebAssembly target). The proposal in short is to expose tables as arrays of references which will be lowered appropriately by the backend. This intends to expose the basic functionality in a straightforward manner, leaving space for later experimentation. See below for more background on WebAssembly tables and reference types, and options that have been considered.
WebAssembly (abbreviated as Wasm) is a stack-based virtual machine that has support shipping in all the major browser engines. LLVM has a WebAssembly backend as one of its default targets.
Reference types and tables were introduced into the Wasm spec in order to allow more efficient interoperation with the host environment (e.g. garbage-collected JavaScript).
Thank you to Thomas Lively, Paulo Matos, and Andy Wingo for comments on a draft of this RFC or related documents.
Background on WebAssembly reference types
Values in early versions of the WebAssembly specification only had numeric types: either i32, i64, f32, or f64. The reference type specification later added a new kind of value type, for opaque host-managed values.
In contrast to the other value types, reference types are opaque types for which it is impossible to observe their size or their bit pattern. Reference-typed values canât be stored in linear memory (i.e. used with WebAssembly load/store instructions). Instead these values can be stored in tables, which can be thought of as linear memory for reference types.
Currently there are two concrete reference types in WebAssembly: funcref (reference to a function), and externref (a reference to an object owned by an embedder). LLVM already supports them both. At the IR level, these are modelled as pointers to specific non-default address spaces (10 and 20, respectively), with appropriate WebAssembly instructions being selected during lowering (MVT::externref
/MVT::funcref
types). A patch for supporting them in Clang is under review, exposing __externref_t
and representing function pointers with the __funcref
attribute as funcrefs. These are modelled as sizeless types (much like scalable vectors are), with the additional semantic restriction that you canât take the address of an externref or funcref.
Though there are only two reference types currently, in the future this set will become unbounded. For example the typed function references proposal will allow WebAssembly modules to declare funcref values with a specific type (parameters and results). Other proposals will allow WebAssembly modules to declare host-managed aggregate types (garbage-collected structs and arrays). Any proposed design for how to represent WebAssembly tables in Clang should take these future developments into account.
Background on WebAssembly tables
As it isnât possible to load/store reference types to WebAssembly memory (which corresponds to the default address space in LLVM), a new datatype was introduced that is able to hold them - the table (or arguably just exposed/generalised tables: WebAssembly modules did have the indirect function table, but there was no ability to directly manipulate it or define new tables).
A WebAssembly module has some number of tables, each referred to by an integer index (tableidx
) and having a type (currently either externref
or funcref
, but in future a WebAssembly module will be able to declare a table containing any reference type in the module). Although the tableidx is a plain integer, the instructions to manipulate tables in Wasm take the tableidx as an immediate operand, meaning code generation isnât possible unless the specific tableidx is a known constant. This implies additional semantic restrictions on any representation of tableidx in a source language.
You might, for instance, use a table of externrefs in order to track objects managed by the host environment but referenced from WebAssembly.
Support for tables is already present in LLVM IR, where they are represented as global arrays of addrspace 10/20 pointers (with the array itself in address space 1, the âwasm_varâ address space used for Wasm locals/globals). Appropriate table declarations are created during lowering, and loads/stores to these tables at the IR level are converted to table.get
and table.set
instructions.
WebAssembly tables in Clang
My current patch under review allows the declaration of arrays of reftypes (e.g. __externref_t table[0];
), which lowers to IR that is already recognised as a table type. rjmccall quite rightly suggested an RFC on this approach would be helpful, hence this document.
In summary:
- Tables are declared like
__externref_t table[0];
(static, default storage class, or extern)- Initialising the table is not currently allowed as the WebAssembly table initialization mechanism (element segments) isnât currently supported in LLVM WebAssembly. This can be added later.
- The current iteration of the patch conservatively disallows
extern
due to lack of test coverage, but this is expected to work. - It is likely we will later want to allow something along the lines of the import_name attribute to allow importing tables under a certain name.
- Array operations on these arrays will be lowered to table.set and table.get by the backend.
- Builtins are exposed for table.size, table.grow, table.fill, and table.copy.
Because the integer tableidx must be used as an immediate operand to the Wasm table instructions, a range of semantic restrictions on use of tables is necessary. Tables may only be used as the operand to a table builtin, or when indexing it as an array (i.e. table[idx]
or table[idx] = foo
).
Implementing these semantic restrictions requires:
- Making an exception to the usual restrictions on sizeless types to allow the array representing a table to be declared since the reference type table elements are themselves sizeless types.
- Modifying a number of other parts of Sema to produce appropriate errors if tables are used in a context where they are not supported.
Potential positives of this representation:
- Provides a clear path forward for declaring tables of different types as the set of reference types increases.
- Aligns to the current IR representation, meaning codegen is straightforward.
- Syntax is familiar and ergonomic (though this is largely a matter of taste)
Possible negatives:
- Thereâs a lot of overlap with semantic restrictions on sizeless types, but new diagnostics must be implemented for tables (even if isSizelessType were modified to return true for tables, the current code paths for sizeless type diagnostics may not be hit because of other tests like
isArrayType
returning true.
The main alternative approach I can think of would be to introduce a new sizeless type for each table type and to use builtins for table.get and table.set. This would be straightforward initially with __externref_t_table
and __funcref_t_table
, but once the typed function references spec is introduced it would require a way to generate a table type for a given reference type. A few ways this might be achieved:
- Possibly thereâs a way this could be done with a builtin to create the type (if anyone knows of examples of something similar, that would be helpful)
- Something long the lines of
__attribute__(__table_type__(__externref_t)) __table_t
- Following along the lines of Altivec (which introduces the
vector
keyword) and adding a new keyword for tables (mentioning for completeness, not actually proposing this)
I havenât pushed much in these directions as they didnât seem to have any real advantages over the approach currently being used. Iâd welcome feedback to the contrary of course!
Conclusion
In summary:
- References and tables in WebAssembly are supported at the IR level and the
WebAssembly backend (codegen and MC layer), but not currently exposed to
C/C++ in Clang. - This RFC details the approach currently being used to expose tables to
C/C++. - Iâd really welcome feedback, suggestions, and questions. Thank you in
advance.