[RFC] Systematic way to introduce and use libc config options

Introduction

The libc project supports a number of config options which are primarily present to give the users knobs to control:

  1. The inclusion of a certain functionality which is otherwise not included by default.

  2. Or, exclusion of a certain other functionality. The user could have various reasons to do so, for example, to reduce code size.

  3. To set limits, sizes, bounds etc. For example, the default thread stack size.

Most config options today are only available as pre-processor switches which enable or disable parts of the libc source code. For example, there exists a macro named LIBC_COPT_PRINTF_DISABLE_FLOAT to disable the code handling printing of floating point numbers in printf and friends.

As the libc project is growing, so are the number of config options. However, a systematic approach to introducing the config options and making them available to the users is missing. For example, many of the options introduced do not have a corresponding CMake var to control the option value at libc config time. Another consequence of the lack of a systematic approach is that platform maintainers do not have a clean way to specify/codify the config parameters that best suit their platform. Often, they have to resort to adding messy conditionals in a number of CMakeLists.txt files.

A previous RFC by @gchatelet introduced a systematic naming system for the pre-processor macros which control libc config, which has now been documented and rolled out. In this RFC, we further some of the fundamental ideas proposed in that RFC with a proposal for a systematic procedure to introduce and use libc config options. The core idea of this proposal is an introduction of a uniform config listing system and a common pattern for config application.

The config listing system

The config listing is driven by a set of hierarchical JSON files. At the top of the hierarchy is a JSON file by name config.json in the config directory. This JSON file lists the libc options which affect all platforms. Along with each option is also its value. For example:

{
    "printf": {
        "LIBC_CONF_PRINTF_DISABLE_FLOAT": false
        ...
    }
}

The above config indicates that the option LIBC_CONF_PRINTF_DISABLE_FLOAT has a value of false. A platform, say the baremetal platform, can choose to override this value in its config.json file in the config/baremetal directory with the following contents:

{
    "printf": {
        "LIBC_CONF_PRINTF_DISABLE_FLOAT": true
        ...
    }
}

As you can see, the config for the baremetal platform, overrides the common false value of the LIBC_CONF_PRINTF_DISABLE_FLOAT option with the true value.

Format

Named tags

As can be noted in the above examples, config.json file contains a top-level dictionary. The keys of this dictionary are the names of grouping-tags. A grouping-tag is nothing but a named tag to refer to a group of libc options. In the above example, a tag named printf was used to refer to all libc options which affect the behavior of printf and friends.

Tag values

The value corresponding to each grouping tag is also a dictionary called the option-dictionary. The keys of the option-dictionary are the names of the libc options belonging to that grouping tag. For the printf tag in the above example, the option-dictionary is:

{
    "LIBC_CONF_PRINTF_DISABLE_FLOAT": true
    ...
}

The value corresponding to an option key in an option-dictionary is the value to be used for that option when configuring the libc build. Options which are of ON/OFF kind take boolean values true/false. Other types of options can take an integral or string value as suitable for that option. In the above option-dictionary, the option-key LIBC_CONF_PRINTF_DISABLE_FLOAT is of boolean type with value true.

Option name format

The option names, or the keys in a option-dictionary, have the following format:


LIBC_CONF_<UPPER_CASE_TAG_NAME>_<ACTION_INDICATING_THE_INTENDED_SEMANTICS>

The option name used in the above examples, LIBC_CONF_PRINTF_DISABLE_FLOAT to disable printing of floating point numbers, follows this format: It has the prefix LIBC_CONF_, followed by the grouping-tag name in upper case, followed by the action to disable floating point number printing.

Mechanics of config application

Config reading

At libc config time, three different config.json files are read in the following order:

  1. config/config.json

  2. config/<platform or OS>/config.json if present.

  3. config/<platform or OS>/<target arch>/config.json if present.

Each successive config.json file overrides the option values set by previously read config.json files. Likewise, a similarly named command line option to the cmake command will override the option values specified in all of these config.json files. That is, users will be able to override the config options from the command line.

Config application

Local to the directory where an option group is relevant, a convenience function to produce the actual compile and link options from the option values, is to be implemented. For the printf tag for example, convenience function to collect all printf related compile options should be implemented:

function(get_common_printf_compile_options opt_list)
  # This function should look up the option values for the printf tag
  # and populate opt_list var with compile options corresponding to
  # the option values.
  ...
endfunction()

Option groups which affect linker options can have a similar convenience function to generate common linker options affecting that group. Once generated, the compile and linker options can be used as follows:

add_object_library(
  ...
  COMPILE_OPTIONS
    ${common_printf_compile_options}
    ... # Other compile options affecting this target irrespective of the libc config options
)

CMake and Bazel

The JSON format was chosen for the config files because the build systems of interest for the libc project, CMake and Bazel, both support JSON parsing. However, it is not convenient to read JSON files and deduce compile options in Bazel. Nevertheless, Bazel supports Python like dictionaries as first-class data types. Hence, the configs in Bazel will be listed as Python dictionaries in .bzl files. Consequently, some amount of duplication of config listings is anticipated between Bazel and config.json files used by CMake. Specifically, the Bazel listings should mirror the listing in the config.json files.

Developer workflow around adding and maintaining libc options

The libc config options usually take effect as one or more preprocessor macros which include or exclude parts of the libc source code. In order not to make options a direct reflection of the source code mechanics, the libc developers should introduce options in the following manner:

  1. First step is to concretely define the semantics of an option without mixing it up with the source code mechanics.

  2. Once an option is coined and its semantics defined concretely, it should either be added to an existing option group or to a newly created option group.

  3. Likewise, as necessary/required, the appropriate config.json files should be updated with suitable values for the new option.

  4. The necessary code changes, by introducing appropriate preprocessor macros, should be done to make the options actually take effect in source code.

  5. The docs/libc_config.rst document should be updated with information about the newly introduced option. Note that docs/libc_config.rst does not exist today. It will be added as part of the rollout of this proposal.

1 Like

I like the proposition overall.

Should we enforce that nested configs (i.e., config/**/config.json) are always a subset of the config/config.json?

In this way config/config.json would be the single source of truth for the option name and this would help catch typos when overriding an option.

Agreed. It should be straightforward to implement a check to ensure that an option is actually listed in config/config.json.

I think we should try to have a mechanically-checked machine-readable indicator of what all the options are. For example, a JSON schema. Perhaps the JSON schema “description” elements could be generated from the .rst file or vice versa. But the gist is that an external build system ingesting the libc source code should have a way to ingest the latest set of options and ensure that its own logic for setting their states (which may or may not use the config.json files) is complete.

Config specified in JSON is already a machine readable form so I think we can generate the .rst file from the config JSON files - the main config/config.json file should include the documentation along with the default value for this to happen.

What @roland is bringing up additionally, feel free to correct me if I am wrong, is this:

  1. If downstream uses a build system different from CMake or Bazel, how do we ensure that the downstream build sees errors or warnings if and when the upstream config semantics or names change.

I can extend this with an additional nice to have requirement:

  1. It will be ideal if the upstream JSON config can also encode where and how a certain option can be made to take effect.

I am going to discount #2 for now because I think the goal should be to make the config listing system as simple as possible. Whatever simple view we start with will inevitably evolve into something richer and more complex. But, I think we should let that evolution happen organically over a simple system we begin with.

For #1, we can do a few things to ensure that downstream sees test and build failures:

  1. For a boolean option, we should include a negative as well as positive test. This can happen a) by explicitly building and testing for two configs, or b) having enough coverage on the buildbots that both the true and false variations are tested on at least one bot.

  2. If a new config option is introduced, then the source which gets affected by this should require explicit setting rather than just falling back to a default. Hence, if a downstream build system is not updated, then it will see build errors.

  3. If a config option is removed, the downstream build logic expecting it will error out and flag.

Is there anything else we need to consider here?

1 Like

Looks good to me :+1:

Could we omit the grouping-tags? It’s not clear if we need those and omitting them would simplify parsing. We could always introduce them later if needed.

This also brings up a question about evolution. Is the format considered internal only or we want to provide any future compatibility guarantees? If it’s the latter, then perhaps we should include a version field so we could detect when an incompatible version of the format is being used and report an error.

The grouping tag is there for couple of reasons:

  1. When we generate the .rst file from the config.json file, the grouping tag helps in listing all related options together in a group.
  2. Not very important, but even in the absence of the .rst generation, it kind of enforces some kind of discipline when developers add new options. Comments would do that job normally, but JSON format does not have a comment syntax.

For now, I view this as being an internal only format. As in, it is consumed by libc internal tooling.