Wasm, start function, and default globals

Apologies if there is a better forum for these questions. Please redirect me if so.

I’ve been using the clang/wasm-ld tools to experiment with some basic examples, and there’s a couple things I’m wrestling with.

  1. How to denote a function as the “start” function (https://webassembly.github.io/spec/core/binary/modules.html#start-section)

  2. How to avoid the defaulted __heap_base global.

I’ve dug around on the samples, mailing lists, and in the wasm-ld code for an attribute or flag to specify the start function, but don’t see anything. Is this just not implemented (or tracked) yet?

For 2), I always see the below global in the compiled output even with the most trivial code that makes no reference to it (e.g. compiling just a simple function that only references values on the implicit operand stack).

(global (;0;) (mut i32) (i32.const 66560))

Per https://dassur.ma/things/c-to-webassembly/, looks like this is the __heap_base value implicitly provided for modeling the stack/heap as are usually present in most C/C++ environments, even though unused in my compiled output. Is this something that could/should be optimized away, but again just isn’t implemented yet? Any way to suppress this building with the current clang/wasm-ld toolchain?

Thanks,

  • Bill

There is a good amount of expertise on this in Emscripten community:

You can also try to ask in one of the WebAssembly repos:

I think Luke Imhoff has done some work accessing the start function, I am not sure if he is on this list. I personally don’t know of a way write to start function. There is a bit on it in tool conventions:

The __heap_base symbol is there because Wasm does not have the memory layout C family languages expect. At higher opt level Clang is usually pretty good at removing its use.

Best,

Petr

The start function is not meant for general use, but rather for toolchains to initialize the state of the runtime where they can guarantee that no imports will be required.

For getting rid of unused globals and generally optimizing your wasm modules, I highly recommend using Binaryen’s wasm-opt tool. https://github.com/webassembly/binaryen.

[sending this as a new email, as I didn’t get responses to my prior posting via email – I just saw them on the archive]

Thomas Lively wrote via llvm-dev on Wed Oct 16 17:33:54 PDT 2019

The start function is not meant for general use, but rather for toolchains

to initialize the state of the runtime where they can guarantee that no

imports will be required.

But I am trying to write a simple toolchain, hence trying to use it to initialize my globals (or at least the one global I use for my memory allocator).

Interestingly, if I have a global of a type with a simple constructor (e.g. just setting members to integral values), I can see it doesn’t call the constructor, but just inlines the resulting memory values. For example, if setting the first member to 0xCCCC I see “(data (;0;) (i32.const 1024) "\cc\cc\00\00\etc…” in the WAT. However if my global is a type with a non-trivial constructor, the “data” entry for it is just zeroed out, and there is no startup function set to run any initialization. (Which seems like a bug – or at least a compiler warning should indicate globals will not be initialized as expected).

If this is something that SHOULD be added, happy to take a crack at it if someone more experience in this area can provide a little guidance when needed.

For getting rid of unused globals and generally optimizing your wasm

modules, I highly recommend using Binaryen’s wasm-opt tool.

https://github.com/webassembly/binaryen.

Yeah, just trying minimize the toolchain right now. Does Binaryen preserve the “names” section? Clang/LLVM is populating that quite nicely, and some of the stuff I’m experimenting with (e.g. profiling) depends on its presence.

Is the Binaryen or Emscripten or Wabt repo the best place to ask even Clang/LLVM questions when it comes to Wasm? Seems most of the contributors are active on those projects. (As the GitHub UX is better than this mailing list).

Thanks,

  • Bill

Is the Binaryen or Emscripten or Wabt repo the best place to ask even Clang/LLVM questions when it comes to Wasm?

The Emscripten repo is a nice catch-all for asking cross-cutting questions, including wasm-specific Clang/LLVM questions. This list is of course fine for LLVM questions as well.

However if my global is a type with a non-trivial constructor, the “data” entry for it is just zeroed out, and there is no startup function set to run any initialization.

It sounds like you’re looking for __wasm_call_ctors. I was going to link to its documentation, but it turns out there is none. I’ll fix that here: https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md. Anyway, just add that function to the list of functions lld should export and make sure to call it in your runtime. You’re not seeing your constructor functions now because they’re being garbage collected by lld.

Does Binaryen preserve the “names” section? Clang/LLVM is populating that quite nicely, and some of the stuff I’m experimenting with (e.g. profiling) depends on its presence.

Yes, Binaryen can preserve the names section if you pass it -g.