Is there a way to tell wasm-ld not to create any space for the stack (that won't be used in a browser)

Hi,

I’m building a .wasm binary (from C++) to be used within a browser.
For my tests, 1 page (64kb) of memory is shared between JS and the wasm module (created JS side with WebAssembly.Memory).
Using clang++ / wasm-ld v 18.0.0.

That works, however I noticed that wasm-ld always blocks some space for a “stack” either at the beginning (--stack-first) or after the static data, either 64kb or given by -z stack-size.

But in a browser (please correct me if I’m wrong) the actual stack (for local vars and function returns) is handled by the browser and this reserved space for a stack by wasm-ld is not used.
Of course one can set this size to 16 bytes (the min) which is negligible.

But, to be cleaner, and to help my understanding of the whole thing, is there a way to prevent wasm-ld from injecting a stack?
In Writer::layoutMemory() there does not seem to be an option to do that (to bypass placeStack and other stack-related lines). Would that be all, or doing this might be risky? (for an browser-only wasm usage) - yes I’m new to WebAssembly …

Thanks

For LLVM (or C-like langauges more generally) when building to wasm, there are actually 2 stacks. One of them is, as you noted, managed by the runtime; it holds locals and return values, and return addresses (the addresses of these are all hidden from the user code). But as you also saw, there is a stack in linear memory (we call it the “linear stack” or shadow stack). This is also necessary because some variables have their address taken by user code, and these addresses need to be linear memory indices. There is also alloca and various other ways that this stack is exposed to LLVM. The ABI that LLVM uses for this stack is documented here.

So most programs do need this stack to exist. It is theoretically possible that no function in a program needs the stack and you could get away without it. I’m not sure whether that would cause lld not to allocate it at all, or whether it could be suppressed completely, it’s probably not a use case anyone has encountered yet. If you really do need that, we could probably figure something out.

Thank you for a detailed answer.
Could you please clarify

some variables have their address taken by user code

Is it in this case?

void f() {
  int x;
  int *p = &x; // <== address "taken"?

since p is directly “visible” by the C++ code? (and if that’s the case, isn’t char s[20]; char c = s[12]; the same, since it’s actually c = *(s+12) and we take/use the address s to make the calculation)

it could be suppressed completely, it’s probably not a use case anyone has encountered yet. If you really do need that, we could probably figure something out

Practically, my (old) experience of assembler makes me very cautious when it comes to memory and stack, and especially when the handling of them is shared between 2 different systems (browser and wasm). Debugging becomes very hard when the stack - sometimes - overflows and scratches data somewhere. On Linux and modern CPUs, accesses are well controlled at the lowest level, it’s easier to detect such problem.

To better understand how things work at the wasm level, at first I thought about adding some options to lld only for my local version (like a -z stack-location= or --no-stack) since the source is readable and well organized.
But, let’s be reasonable… what are 64k nowadays anyways… The current (total) limit of 4GB is more than enough for my needs.
It seems that at the most, the linear stack shouldn’t need a lot of memory.

Yes, you have it right; x is address-taken because its address will be written to p and can then be observed by other parts of the program. (p itself is not address-taken). WRT the pointer-arithmetic, you are right that those are equivalent, but when you build with optimizations, I would expect that the address will never actually be materialized. (Having said that, I would actually still expect the s array to be allocated on the linear stack, because wasm locals are single values; I’m not sure whether there are any optimizations that would try to pack that array into a single SSA value in the IR, but in any case 20 bytes would not fit into a single wasm local).
Note also that when building without optimizations, every local C variable will be allocated on the linear stack because clang creates all of them as allocas (and they only get converted to SSA values later on during optimizations, if enabled). The other consequence of this is that optimized wasm builds often need less linear stack than an equivalent native program (since most variables aren’t address-taken) but debug/non-optimized builds still need a lot more.

I agree that stack overflows can be a pain to debug. The --stack-first allocation scheme you mentioned actually exists to try to help mitigate this; when stack overflows do happen, it would typically cause the stack pointer to underflow/wrap and (if the current linear memory size is less than the full 4GB) generate a trap immediately rather than clobbering some unrelated memory region. Note that this only applies to the main thread though; stacks for pthreads are allocated on the heap. If you’re using emscripten (which, if you’re targeting browsers, I would probably recommend), you can adjust the stack size for all threads using the -s STACK_SIZE flag. And if you build with -s ASSERTIONS then the compiler will also insert stack overflow checks into your code for easier debugging.

Thanks again.
Actually I started with emscripten, then dug a bit more into the emcc parameters and finally landed on the lower level clang++ and wasm-ld.
It’s definitely nice to have access to the clib and memory allocation functions.
But in order to reach a smaller footprint, to control the whole thing (and mostly for the fun of it!) I decided to make my own malloc/tree implementations. (I’m “old school” :slight_smile: