[RFC] YAML I/O

I’ve been working on reading and writing yaml encoded documents for the lld project. Michael Spencer added the YAMLParser.h functionality to llvm/Support to help in parsing yaml documents. That parser greatly helps at the syntax level, but you still need to hand write a lot of semantic checking and then convert the various node types in to something usable.

I’ve developed a layer on top of YAMLParser.h I’m calling YAMLIO.h (yaml I/O) which unifies parsing and writing yaml documents and handles most semantic checking, and is very easy to use! Basically, you define your yaml document schema as a mix of C++ structs and vectors, and YAMLIO does the rest. Lets look at a quick example first. Suppose this is your yaml document:

  • name: Tom
    age: 20
  • name: Richard
    age: 27
    speaks-french: true
  • name: Harry
    age: 23

To read or write such yaml data you would define a C++ type: for the mapping (a struct), one for the sequence of those mappings (a typedef). In the struct you add a yamlMapping() method which associates mapping keys with field names and the fields’s type. (Note: the yamlMapping() method was inspired by the boost serialize() method).

using llvm::yaml::Sequence;

using llvm::yaml::DocumentList;
using llvm::yaml::IO;
using llvm::yaml::Input;
using llvm::yaml::Output;

using llvm::yaml::YamlMap;

struct Person : public YamlMap {

StringRef name;
uint8_t age;
bool speaks_french;

void yamlMapping(IO &io) {
requiredKey(io, name, “name”);
requiredKey(io, age, “age”);
optionalKey(io, speaks_french, “speaks-french”);
}

};

typedef Sequence PersonList;

typedef DocumentList PersonDocumentList;

That’s it. The yamlMapping() method is processed by both the Input and Output to properly handle key-values in a yaml mapping. The Sequence and DocumentList templates are subclasses of std::vector<>.

The data structures are regular structs and vectors. An example of creating them:

// build a person

Person a;
a.name = “Tom”;
a.age = 27;
a.speaks_french = false;
// build sequence of persons
PersonList persons.
persons.push_back(a);

To write a yaml documents your code looks like:

void dump(PersonList &persons, raw_ostream &out) {
Output yout(out);

yout << persons;
}

To read a yaml document your code looks like:

void readYaml(StringRef filePath) {
Input yin(filePath);

DocumentList docList;

yin >> docList;
// if there was an error parsing, message already printed out
if ( yin.error() )
return;

for(PersonList &pl : docList) {
for(Person &person : pl) {
// process each Person
}
}
}

YAMLIO also handles semantic error checking for you. For instance if your document contained an illegal value for a key like:

  • name: Richard
    age: 27
    speaks-french: oui

You would get an error like:

YAML:6:18: error: invalid boolean
speaks-french: oui

^~~~

If the document has an key not in your schema like:

  • name: Tom
    pets: true
    age: 20

You would get an error like:

YAML:3:18: error: unknown key ‘pets’

pets: true
^~~~

As you see, the model of YAMLIO is that you define intermediate data structures which define your yaml schema. The job of YAML IO is to convert between those intermediate data structures and yaml documents. YAMLIO most likely won’t be able to convert between your existing native data structures and yaml. You will probably need to define new intermediate data structures (the schema) and then write code to convert between your native data structures and the intermediate ones. But that glue code is super simple, mostly just copying fields and iterating lists. All the yaml specific work (formatting and semantic checking) is done by YAMLIO.

In the example above the scalar types (strings, integers, booleans) were all built-in types . YAMLIO also has support for enumerations and bit masks. Here is an example of a simple enumeration (color) and a bit mask set (flags). Suppose your data structures already defines Colors and Flags:

enum Colors {
cRed,
cBlue,
cGreen
};
#define FlagBig 1

#define FlagLittle 2

#define FlagRound 4

#define FlagPointy 8

And you want the yaml documents to use human readable values for colors and flags, rather than just the integer value used internally. To handle that, you define conversion tables and hand them to YAMLIO. For instance:

using llvm::yaml::IO;
using llvm::yaml::Input;
using llvm::yaml::Output;

using llvm::yaml::YamlMap;

using llvm::yaml::UniqueValue;

using llvm::yaml::BitValue;

static const UniqueValue colorConversions = {
{cRed, “red”},
{cBlue, “blue”},
{cGreen, “green”},
{cRed, NULL} // default value for optional keys
};

static const BitValue<uint32_t> flagConversions = {
{FlagBig, “big”},
{FlagLittle, “little”},
{FlagRound, “round”},

{FlagPointy, “pointy”},
{0, NULL}
};

struct Test : public YamlMap {

StringRef name;
Color color;
uint32_t flags;

void yamlMapping(IO &io) {
requiredKey(io, name, “name”);
optionalKey(io, color, “color”, colorConversions);
requiredKey(io, flags, “flags”, flagConversions);
}

};

The above defines a yaml mapping with three keys: name, color, and flags. When writing the color value out, the table colorConversions is used to map the in memory value to a string. In this case, the color field is marked as optional. That means when reading the yaml document, if there is no “color:” key, the struct’s color field will be filled in with the last value (the one with the NULL string pointer) in the table, in this case the value red.

When writing the flags value out, the table flagConversion is used to convert the bits in the flags field to a sequence of flag values.

A valid yaml document for this schema is:

  • name: Tom
    color: blue

flags: [ big ]

  • name: Richard
    color: red

flags: [ little, pointy ]

  • name: Harry
    flags: [ little, round ]

My initial plan was to add YAMLIO to lld and let it mature there, but a got a request to move this down into llvm for another llvm client to use. So, I thought I’d see what llvm community thought of this support.

To see a larger example, attached is a sample mach-o object file (for hello world) encoded in yaml along with the YAMLIO based schema for reading or writing those documents.

example.yaml (2.92 KB)

ObjectIO.h (5.98 KB)

I really like this! +1 for inclusion in LLVMSupport instead of lld.

I have a project that could definitely make use of this. Right now, I am using YAMLParser directly; it’s not difficult, but this would definitely make it easier.