Brief list of things discussed in the embedded workshop on Oct 10th.
LTO and linker scripts
Libraries (libc, libc++)
- Stack usage – what recourse does developers have, when the stack frame size is larger than a certain size?
- Linker script semantics
- What else can libc do to support embedded use cases?
- Gaps in tooling and diagnostics
- Reverse engineering a callgraph helps reconstruct maximum stack usage; challeng - Indirect branches.
- To get stack usage estimate statically, emitting relocs can help:
- Sanitizer team implemented Call graph sections
- Callees of each function can be embedded in the call graph section
- With stack size section and call graph section, a post processing tool that can extract this data and evaluate stack size – the tool can have levers to make analysis more conservative or not.
- Filling a stack full of value with a canary – alternative way to measure stack usage during runtime.
- Clang – “-Wframe-larger-than” seems to be the only tool available to investigate stack usage
- If you conditionally call alloca, there’s no warning for it
- GCC has a hierarchy of “-Wframe-larger-than” with additional checks that can be added
- No recourse for developers here to understand what’s using the stack. GDB/LLDB can do it but there are no tools to visualize.
- The compiler bugs exaggerate this problem.
- Opt-remarks has visualization through HTML output pointing to source code
- There is no way to attach a metadata and will be nice to have
- Stack slot coloring –
- More excessive stack usage due to individual stack slot not being shared.
- Particularly bad for temporaries
- Prototype implementation for global parts of ASAN available from ARM. This will be demoed in one of the future embedded WG meetings.
- ASAN takes up a lot of virtual memory. Anyway to get that under control?
- HWASAN but needs hardware support
- There are several hidden flags that can be used to tweak resource usage
- ASAN - Turn off “use-after-scope” testing to get some resources back
- Hidden within source files. Generally not encouraged for the developers to use it.
- Should we expose what tuning knobs devs can use?
- Tradeoff: “Trap” will less binary code vs Pretty diagnostics
- ASAN – can the granularity be changed from from 8:1 to 64:1 to reduce memory use?
- Could ASAN have a trapping mode?
- Kernel does this for UBSAN production mode
- Are there any “use it at your own risk” flags?
- Discourse maybe a good place to ask this
- It may be nice to have documentation for these options
- There are a bunch of runtimes options as well.
- E.g. Shrink the quarantine
- Not well documented.
- Lots of defaults assume large virtual memory. For baremetal its much harder.
- Userspace ASAN runtime –
- Things in compiler-rt not going to work for “any” baremetal system
- How often embedded targets link the compiler-rt?
- Example from profiling
- You can write the interface from a single C file.
- ARM embedded toolchain ships with builtins
- Recently got stable ABI defined for ASAN – especially to write a custom runtime
- Fundamentally requires more memory.
- Maybe cover partial parts of the memory?
- For use-after-free, need a quarantine which has larger cost
- Function attribute “no-sanitize” annotation can be used to exclude symbols from instrumented
- Sizes of stack frames could be a powerful workflow
- Trap could be really useful on its own without diagnostics
- There’s no intercept in embedded and no malloc as well
- ASAN trap mode
- Need allocators built for ASAN
- Could outline checks-
- Is there a documentation page for tradeoffs?
- Set of recommendations for more constrained environments
- General optimizations for size will be nice
- Memory and code size are the blockers. Virtual space in some cases as well.
- ASAN on simulator for ARM32 is what ARM does
Do we actually know what the semantics of the linker scripts are?
This came up in the context of LTO.
For selectors “abc.o(text)” – what should happen here?
Is it a bug if LLD doesn’t implement GNU LD specific behavior?
LTO is going to imply function sections – right now we don’t retain the info in LTO compilation.
How do we prevent cross module inlining?
- Common customer request: the existing code is working – how can “-flto” magically work without any other changes?
- Placing things in different sections from different TUs – in non LTO cases – could also bring out the problem object selector problem.
Can we make LLD more discoverable?
- We need some sort of granularity.
- In Kernel, we place things that were run once in a “red” mapped section. Handle them differently from the unmapped section for example.
- Every output section is its own indivisible unit – this is pessimistic/conservative probably
- Maybe make this configurable – it may require changing semantics and could break compatibility with GNU LD
- Embedded systems have higher requirements from linker script memory regions than kernel usecases
- Tried to LTO libc into the clang binary, it doesn’t inline anything from libc – callee is freestanding and caller is not – this will prevent inlining
- 32 bit port of uboot to link with LLD with overlays
- NOCROSSREFS – missing. Required for any real use cases
- Orphan placement in linker script is another real problem