[RFC][landed] Implement `getauxval`

What is getauxval?

Detailed documentation is at getauxval(3) - Linux manual page. It is a routine to get basic information about the current platform, such as page size, cache size, hardware capabilities, and so on.

Why is it important, anyway?

As stated in the above section, the function is critical for getting essential low-level information from the system. Say you want to test allocations based on page size, you probably want to use sysconf with _SC_PAGESIZE. That function call, however, is based on getauxval.

As for other applications, getauxval is important when users want to dispatch their functions based on hardware information.

How is getauxval implemented, normally?

The Linux ABI on most platforms (all platforms that are currently supported) specifies that auxv is passed onto the initial process stack after argv and environ (separated by NULL values). Therefore, in other libcs, the startup routine scans the initial stack and stores a pointer to auxv in a global variable. Subsequent calls to getauxval will then lookup values from auxv.

A side note: When should we populate auxv?

We should consider the situation where getauxval is called early in the startup stage. getauxval can be used in ifuncs’ dispatchers. Therefore, in the startup routine, auxv should be populated right after TLS initialization, and before calling other initialization functions.

What makes it complicated in LLVM’s libc?

LLVM’s libc supports both overlay mode and fullbuild mode. That is, user can use symbols from LLVM’s libc while the library is not loaded by our own startup routines.

So, what are other ways of getting auxv if we are in overlay mode?

There are several ways, each having their own limitations:

  • Linux 6.4 adds PR_GET_AUXV support to SYS_prctl (https://lwn.net/Articles/928305/). This is only supported on fairly recent kernels and it requires allocating the whole auxv space before the syscall.
  • It is also possible to read entries from /proc/self/auxv. However, procfs may not be mounted under certain situations.
  • It is also possible to perform a stack unwinding to the initial process stack. However, this is very error-prone and should not be used in practice.

What is the design, then?

struct AuxEntry {
  long id, value;
};
long getauxval(long id) {
    if (app.auxv_ptr is not null) {
        return search(id, app.auxv_ptr);
    }
    static AuxEntry * auxv;
    static OnceKey once_key;
    once (&once_key, [] {
        auxv = mmap(..); // mmap with a fixed size
        if (auxv) {
            first try syscall(SYS_prctl, ...);
            then try read the whole /proc/self/auxv into auxv;
            if both fails, then clean up memory;
            otherwise register a cleanup hook with atexit.
        }
    });
    if (auxv) {
       return search(id, auxv);
    } else {
       read /proc/self/auxv entry by entry to find id
    }
    return AUX_NULL; // no found
}

What else needs to be done?

The above function needs to check app.auxv_ptr in overlay mode. Thus, app should always be defined even in overlay mode. Another choice is to use a weak symbol.

This design looks overall fine, though I’d say you don’t need to worry about overlay mode for this function. Some functions are fullbuild already since they don’t work well as overlay functions (e.g. pthreads functions) and this seems to also fall in that category.

1 Like

Based on [libc] implement sys/getauxval by SchrodingerZhu · Pull Request #78493 · llvm/llvm-project · GitHub, the code became quite complicated to support both modes and atexit will be another problem. I suggest that we can just remove overlay mode support then.

As an update, the proposal of implementing both full build and overlay support was accepted into libc. The main reason is that this would allow unit tests that depend on PAGESIZE or other parameters to run seamlessly.

As an additional complication during the process, this PR requires atexit which may not be provided by LLVM’s libc. We switch to using a weak symbol of __cxa_atexit instead. The global cache will only be populated if __cxa_atexit != nullptr. __cxa_atexit is chosen instead of atexit because:

  1. For glibc, atexit is in libc_noshared.a rather than libc.so.
  2. LLVM’s own atexit will only be created after the objcopy pass.

Since we already have __cxa_atexit and atexit implementations, maybe we can refactor their implementation to a common internal target, and let all 3 __cxa_atexit, atexit, and getauxval depend on it?

1 Like

I think it is a good idea to expose a internal function to register clean up routines.