RFC: Add a preprocessor to yaml2obj (and other YAML tools)

I am adding -D k=v to yaml2obj, similar to clang -D. This makes it easy
to generate {32-bit,64-bit} x {big-endian,little-endian} tests.

   --- !ELF
   FileHeader:
     Class: ELFCLASS[[BITS]]
     Type: ET_DYN
     Machine: EM_X86_64

# RUN: yaml2obj -D BITS=32 -D ENCODE=LSB %s -o %t.32le
# RUN: yaml2obj -D BITS=32 -D ENCODE=MSB %s -o %t.32le
# RUN: yaml2obj -D BITS=64 -D ENCODE=LSB %s -o %t.64le
# RUN: yaml2obj -D BITS=64 -D ENCODE=MSB %s -o %t.64be

See ⚙ D73828 [yaml2obj][test] Simplify some e_machine EI_CLASS EI_DATA tests for examples how -D simplifies tests.

Do people think it may be useful in other YAML tools? If yes, I'll move
the yaml2obj implementation (⚙ D73821 [yaml2obj] Add -D k=v to preprocess the input YAML ) to
include/llvm/Support/YAMLTraits.h llvm::yaml::Input so that other YAML
tools can use the feature.

Do people prefer a different syntax? I think [[PATTERN]] is nice because
it is what FileCheck -DFILE=... uses:

   # CHECK: ... [[FILE]]

   FileCheck only preprocesses patterns in CHECK lines.
   D73821 preprocesses both comment lines (which include CHECK lines) and non-comment lines (which include YAML).
   It is not a problem that the YAML preprocessor also processes CHECK lines, because tokens on a comment line will be ignored.

If -D UNDEF= is not specified, should [[UNDEF]] in the source be considered an error?
I think it is fine not to treat it as an error because there can be
legitimate use cases of unterminated [[, for example, [[ in a string literal.
YAML parsing is complex. I don't expect the preprocessor to be smart
enough to recognize string literals. (llvm/lib/Support/YAMLParser.cpp does not seem to provide raw strings of
spaces and comments. Hooking a preprocessor into the scanner does not seem to be simple.)

Do people know other preprocessing features which may be useful?

As someone who suggested this kind of functionality in a review earlier, this will certainly be useful, I think. I think the syntax makes sense to me, if we allow for unrecognised macros to just be treated as part of the input string. This means we don’t have to worry about complexities like escaping “[[” etc. In the (unlikely) event that somebody’s YAML needs to include the literal “[[FOO]]”, they simply should not also use -DFOO - use a different name instead, e.g. -DBAR.

​The idea itself is indeed good.

Regarding to escaping: I think we should have it.

Imagine the following example (I’ve took it from D73828).

— !ELF

FileHeader:
Class: ELFCLASS[[BITS]]
Type: ET_EXEC
Machine: EM_386

RUN: yaml2obj %s --docnum=4 -D BITS=32 -o %t-32bit.o

RUN: yaml2obj %s --docnum=4 -D BITS=64 -o %t-64bit.o

Without escaping it would be:

Class: ELFCLASSBITS

What does not look so clear as a version with escaping IMO.

The idea itself is indeed good.

Regarding to escaping: I think we should have it.

I think you may be misinterpreting what I mean by escaping. I mean needing to use something like ‘’ to allow ‘[[’ to be used in the YAML without having a special meaning. I agree we need some sort of indicator to identify a macro, but also think the approach of simply ignoring things if not defined is fine.

I think you may be misinterpreting what I mean by escaping. I mean

needing to use something like ‘’ to allow ‘[[’ to be used in the YAML

without having a special meaning. I agree we need some sort of indicator

to identify a macro, but also think the approach of simply ignoring things if not defined is fine.

Ah, OK. Sorry for not getting it right.

And I agree with the simplest approach and “ignoring things if not defined is fine​” sounds good to me.