Where's the optimiser gone? (part 0)

Hi @ll,

compiler-rt implements the Windows-specific routines
compiler-rt/lib/builtins/i386/chkstk.S and
compiler-rt/lib/builtins/x86_64/chkstk.S
See <Developer tools, technical documentation and coding examples | Microsoft Docs;

Their implementation is but LESS THAN optimal: they can
yield upto (stacksize / pagesize) superfluous page accesses
(and thus superfluous page faults)!

As implemented, ALL calls of chkstk() touch ALL pages from
the current "top" of stack to its new "top", which might
become the new stack "limit": on access of the "guard page"
Windows handles the stack growth.
Touching of pages already touched before, ie. above the
current "limit" of the stack, is but NOT necessary!

Properly optimised chkstk() implementations (for ML.EXE
and ML64.EXE respectively), which touch every page only
once, are shown below!

regards
Stefan Kanthak

See <https://godbolt.org/z/1jSn6-&gt;

--- sample0.c ---

void foo(int bar) {
    int array[234567];
    array[234566] = bar;
}

_foo: # @foo
    push ebp
    mov ebp, esp
    mov eax, 938272
    call __chkstk
    mov eax, dword ptr [ebp + 8]
    mov ecx, dword ptr [ebp + 8]
    mov dword ptr [ebp - 4], ecx
    mov dword ptr [ebp - 938272], eax # 4-byte Spill
    add esp, 938272
    pop ebp
    ret

int main(int argc) {
    foo (argc);
    foo (argc);
}

--- chkstk.asm (for I386) ---

Hi @ll,

I'm not sure this continuous stream of emails is the most productive form.
I would think these all should be either bugs on https://bugs.llvm.org,
or patches on http://reviews.llvm.org
And in any case, maybe they should be worded slightly differently..

compiler-rt implements the Windows-specific routines
compiler-rt/lib/builtins/i386/chkstk.S and
compiler-rt/lib/builtins/x86_64/chkstk.S
See <Microsoft Learn: Build skills that open doors in your career;

Their implementation is but LESS THAN optimal: they can
yield upto (stacksize / pagesize) superfluous page accesses
(and thus superfluous page faults)!

As implemented, ALL calls of chkstk() touch ALL pages from
the current "top" of stack to its new "top", which might
become the new stack "limit": on access of the "guard page"
Windows handles the stack growth.
Touching of pages already touched before, ie. above the
current "limit" of the stack, is but NOT necessary!

Properly optimised chkstk() implementations (for ML.EXE
and ML64.EXE respectively), which touch every page only
once, are shown below!

regards
Stefan Kanthak

Roman.