Prepreocessor: Embed Parameter Ordering

The C and C++ Compatibility Study Group, when working on the new standard #embed preprocessor parameter that mirrors the clang::offset(...) and gnu::offset(...) parameters, had someone raise a concern that the order of may be confusing. The concerns came from the June 4th, 2025 meeting (anchored link).

Background

Throughout the rest of this text, clang::offset, gnu::offset, and the almost-standard offset parameter will be used interchangeably in prose. They represent the same preprocessor embed parameter, with the same semantics.

Similarly, a resource named <data.bin> is a resource with exactly 10 bytes and is considered as such when put in an #embed statement.

While the following 2 invocation of #embed are identical and produce exactly the same data:

#embed <data.bin> clang::offset(1) limit(3) /* ONE */
#embed <data.bin> limit(3)  clang::offset(1) /* TWO */

some people questioned whether or not the difference in order might make some people confused that they do not produce identical effects (e.g., that offset is always calculated first based on the raw file size, and then limit is applied after, or vice-versa).

The Core Proposal

Following from the background, some people advocated for providing a warning/error for if it was written in the ā€œwrong orderā€. That is, since limit always applies after offset, the standard wanted to mandate that such parameters must always be written in a specific order. That is, /* ONE */ would be fine but /* TWO */ should trigger an error.

It was then pointed out that this can also apply to other parameters based on the standard wording. For example, limit(0) or offset(SIZE_MAX) can make a resource that has data be considered ā€œemptyā€. In particular, using <data.bin> again:

#embed <data.bin> limit(0) if_empty("meow") /* THREE */
#embed <data.bin> if_empty("meow") limit(0) /* FOUR */

/* FOUR */ should issue a diagnostic since if_empty is being evaluated before limit turns the resource empty, while /* THREE */ would issue no diagnostics. This lead to the formulation of the following guidance:

  • offset must appear before limit.
  • limit and/or offset must appear before any of prefix, suffix, or if_empty.

We are asking implementations how they feel about the above 2 rules and implementing them.

To be extremely clear: offset, clang::offset, and gnu::offset always apply before the standard limit(...) parameter, both in Wording and in All Real Implementations, but do not impose an order.

To be more clear: this is not how C23 specified it, and not how C++ standardized it so far. As #embed’s principle author and carrier through the last 7 years, nobody has really came forward to say this was confusing or harmful, but this may simply be selection bias or simply that nobody has spoken up.

We note that some of this is weird. Again, consider the case of /* FOUR */ before:

#embed <data.bin> if_empty("meow") limit(0) /* FOUR */

If <data.bin> is an empty resource, would that mean the preceding if_empty is fine because limit(0) would not have any effect anyways? In an obvious sense, the diagnostic would apply anwyays but this is one of those things where I personally did not believe anyone would advocate for ordering requirements either way so now I feel like I have to ask if that’s a quality-of-implementation thing anyone would care about in the first place. This is, again, in the face of the fact that the order of the parameters does on all the implementations and that nobody has asked me both in the run-up to standardization and after if this should be a thing.

The Questions

Therefore, we’d like to poll the Clang community:

1. Does anyone think a diagnostic on the order will help prevent confusion with users, even if the semantics never change between invocations regardless of parameter order?

2. If the answer to (1) is yes, do we believe it should be a warning (recommended practice in Standard Speak) or an error (a Constraint Violation/Ill-Formed in Standard Speak)?

Sub-questions such as ā€œan error, but only in pedantic modeā€ and similar can be golfed and bikeshedded after answering the first two questions.

A formalization of these semantics is going to be presented to WG21 and WG14 at some point. I’m gathering implementer feedback and willingness to change their existing implementations to formulate a new paper: P3731R0: #embed Preprocessor Parameter Order

Thank you for reading,
Bjƶrkus

1 Like

Personally, I don’t think I’d want an error or an on-by-default warning about the order of embed parameters. This is largely because I don’t think there is one obviously correct ordering, and multiple different ones can correspond to reasonable ways of thinking about embedding data from a file.

For example, limit(N) offset(M) means ā€œembed up to N bytes starting from offset M in the fileā€, while offset(M) limit(N) means ā€œgo to offset M in the file and then embed up to N bytesā€. These both seem like plausible ways that someone might think about the embedding operation.

Similarly, if_empty(xyz) limit(N) means ā€œIf the file is empty, insert xyz. Otherwise embed up to N bytes from the file.ā€ (The examples with limit(0) are silly and might justify a warning, but that’s a separate issue.)

Since there isn’t one obviously correct ordering, trying to impose a required ordering would impose at least some cognitive burden on programmers to remember it. It would also have the potential to break existing code, since the existing implementations and the specification in C23 do not impose such a requirement.

(I’m the maintainer of a small C implementation where I’m currently implementing C23 features, as well as a user of Clang and GCC. As I’ve said above, I don’t see the need for new standard requirements in this area, but if something does get standardized I’m willing to implement it.)

1 Like

I suppose there is precedent here what with us warning about situations where the order of fields in the member initialiser list of a constructor doesn’t match the order of fields as declared in the class; so if that was deemed confusing enough to warn about it then I can see an argument that we should do the same for this situation.

#embed is a fairly new feature so there probably isn’t much data about this out there, but do you happen to know of any cases where someone actually got confused about this, or are we speaking in the hypothetical here? To be clear, I’m not saying that we shouldn’t make this a warning even if it’s the latter, but rather that it would be a pretty a good argument in favour of adding a warning if it’s the former.

I personally feel like a warning would be fine; who knows, maybe some people end up finding the ā€˜wrong’ order more intuitive.

2 Likes

It is a hypothetical concern that came from SG22. In the years that I was responsible for vending this directly to people, nobody complained about limit before/after prefix/suffix or empty (it was named empty before it was renamed to if_empty). They didn’t complain about other forms of parameters in the earlier renditions of #embed either, when it’s old syntax was #embed filename limit-param width-param ... (but they did complain that since it was a naked number it was confusing, which is what led to the final design).

I haven’t seen anyone complain about clang::offset and gnu::offset, but they’re SUPER new so who knows. So this would really be based on how implementers feel about it.

Thank you for coming to ask!

My personal preference is for there to be no imposed order and leave diagnostics up to QoI rather than forcing a recommended practice.

The existing set of parameters were designed with no order specified; you have to read the entire embed directive before you can act on it. Introducing an ordering after it’s already released with that mental model is not super helpful. Especially because there’s already ordering issues between limit and if_empty.

And the issues that come from user mistakes in this space seem unlikely to be hit often and no worse than any other kind of ā€œoops you held it slightly weirdlyā€ situations in C. In general, users should already know what they’re embedding and getting something different should be obvious to them.

1 Like

I’m mildly in favor of making the confusing thing ill-formed, but a good start would be to explore a warning and get usage experience with that

1 Like

Is that ā€œClang might be open to trialing this warning with users and seeing how it goes.ā€?

Or is that ā€œThe standard should standardize something and then we’ll go try it out and then report back.ā€?