object construction patterns and unique_ptr

In lld we have a Reader class which is a factory that takes .o file(s) and produces an in memory lld::File object(s). But in doing so, there could be I/O errors (file not readable) or file may be malformed. We are also using C++11 in lld, so we use std::unique_ptr for managing object ownership.

The Reader class currently has an interface that can be simplified down to:

    virtual error_code readFile(StringRef path, std::unique_ptr<lld::File> &result);

But this interface has become awkward to use. There are two "return" values. This method is a unique_ptr "source" but the use of a by-refernce parameter means you have to start with an uninitialized unique_ptr and the readFile() has to std::move() into it. unique_ptr is much nicer to use as a return value.

An other issue is that since llvm::error_code was designed to return a fixed set of strings, there is no way to return a dynamic error message (e.g. "relocation 27 is invalid").

In asking around for clean models on how to do this better, I've heard two interesting ideas:

1) Incorporate the error_code and string message into the File class. That is the factory method always returns a File object. But there is a method on File objects to query if they are valid or not. If they are not, there is a method to get the error_code and/or error string.

2) A variation on this is to have a new subclass of File called FileError. If the Reader cannot construct a real FileFoo object because of some error, it instead constructs a FileError object and returns that. The FileError object has methods to get the error_code and/or error string.

With both these ideas the Reader method is:

    virtual std::unique_ptr<lld::File> readFile(StringRef path);

which is much easier to use.

Any recommendations on a good way to design this?

-Nick

In lld we have a Reader class which is a factory that takes .o file(s) and produces an in memory lld::File object(s). But in doing so, there could be I/O errors (file not readable) or file may be malformed. We are also using C++11 in lld, so we use std::unique_ptr for managing object ownership.

The Reader class currently has an interface that can be simplified down to:

   virtual error_code readFile(StringRef path, std::unique_ptr<lld::File> &result);

But this interface has become awkward to use. There are two "return" values. This method is a unique_ptr "source" but the use of a by-refernce parameter means you have to start with an uninitialized unique_ptr and the readFile() has to std::move() into it. unique_ptr is much nicer to use as a return value.

Not a c++'11 expert here, but does a returning an std::tuple work?

An other issue is that since llvm::error_code was designed to return a fixed set of strings, there is no way to return a dynamic error message (e.g. "relocation 27 is invalid").

I'm pretty sure that the linker can specify a custom error_category with whatever information you want, and then return an error_code put with that. Michael would know for sure, but it is definitely in the design of error_code for it to be extensible.

In asking around for clean models on how to do this better, I've heard two interesting ideas:

1) Incorporate the error_code and string message into the File class. That is the factory method always returns a File object. But there is a method on File objects to query if they are valid or not. If they are not, there is a method to get the error_code and/or error string.

2) A variation on this is to have a new subclass of File called FileError. If the Reader cannot construct a real FileFoo object because of some error, it instead constructs a FileError object and returns that. The FileError object has methods to get the error_code and/or error string.

With both these ideas the Reader method is:

   virtual std::unique_ptr<lld::File> readFile(StringRef path);

which is much easier to use.

Any recommendations on a good way to design this?

Would an API like:

   virtual std::unique_ptr<lld::File> readFile(StringRef path, error_code &ErrorInfo);

make more sense?

-Chris

In lld we have a Reader class which is a factory that takes .o file(s) and produces an in memory lld::File object(s). But in doing so, there could be I/O errors (file not readable) or file may be malformed. We are also using C++11 in lld, so we use std::unique_ptr for managing object ownership.

The Reader class currently has an interface that can be simplified down to:

  virtual error_code readFile(StringRef path, std::unique_ptr<lld::File> &result);

But this interface has become awkward to use. There are two "return" values. This method is a unique_ptr "source" but the use of a by-refernce parameter means you have to start with an uninitialized unique_ptr and the readFile() has to std::move() into it. unique_ptr is much nicer to use as a return value.

Not a c++'11 expert here, but does a returning an std::tuple work?

I talked to Howard about that. It does work, but it is cumbersome. Either:

std::tuple<std::unique_ptr<lld::File>, error_code> result = reader.readFile(path);

Or you use std::tie() but then you still have to declare the two local variables and tie them together. You could define a wrapper class for the tuple, but then you realize it would be simpler to push the error_code into the File and get rid of the wrapper class.

An other issue is that since llvm::error_code was designed to return a fixed set of strings, there is no way to return a dynamic error message (e.g. "relocation 27 is invalid").

I'm pretty sure that the linker can specify a custom error_category with whatever information you want, and then return an error_code put with that. Michael would know for sure, but it is definitely in the design of error_code for it to be extensible.

From what I can tell error_code is about wrapping out OS specific "int" error codes, and how to map those to static strings. I suppose you could abuse it by having a error_category which has a static vector or recently dynamically generated strings and the int "error code" is an index into that vector.

Clang has this great Diagnostics infrastructure for showing where the error occurred in the source file. Does developing something like that make sense for object files whose source is not human readable?

In asking around for clean models on how to do this better, I've heard two interesting ideas:

1) Incorporate the error_code and string message into the File class. That is the factory method always returns a File object. But there is a method on File objects to query if they are valid or not. If they are not, there is a method to get the error_code and/or error string.

2) A variation on this is to have a new subclass of File called FileError. If the Reader cannot construct a real FileFoo object because of some error, it instead constructs a FileError object and returns that. The FileError object has methods to get the error_code and/or error string.

With both these ideas the Reader method is:

  virtual std::unique_ptr<lld::File> readFile(StringRef path);

which is much easier to use.

Any recommendations on a good way to design this?

Would an API like:

  virtual std::unique_ptr<lld::File> readFile(StringRef path, error_code &ErrorInfo);

That is a common pattern, used for instance in the bit code reader. If we can unify the OS level error and the syntax/semantic errors from parsing object files, this pattern might work.

One other thought that leans me towards integrating the error code/msg into the File object is that it works cleaner when you start adding multithreading to the linker. When you have multiple threads going, each reading an object file and one of them returns an error, you need to cleaning shut down the other threads, and return the error. With the error status integrated into the File object, all you do is have each thread append it resulting File object onto a list, then after all threads are joined, run through the list once to see if any say they encountered an error. If the error_code is returned separately from the File, you have to create some data structure to bind together the error_code with which file path could not be parsed.

-Nick

In lld we have a Reader class which is a factory that takes .o file(s) and produces an in memory lld::File object(s). But in doing so, there could be I/O errors (file not readable) or file may be malformed. We are also using C++11 in lld, so we use std::unique_ptr for managing object ownership.

The Reader class currently has an interface that can be simplified down to:

virtual error_code readFile(StringRef path, std::unique_ptr<lld::File> &result);

But this interface has become awkward to use. There are two "return" values. This method is a unique_ptr "source" but the use of a by-refernce parameter means you have to start with an uninitialized unique_ptr and the readFile() has to std::move() into it. unique_ptr is much nicer to use as a return value.

Not a c++'11 expert here, but does a returning an std::tuple work?

I talked to Howard about that. It does work, but it is cumbersome. Either:

std::tuple<std::unique_ptr<lld::File>, error_code> result = reader.readFile(path);

Or you use std::tie() but then you still have to declare the two local variables and tie them together. You could define a wrapper class for the tuple, but then you realize it would be simpler to push the error_code into the File and get rid of the wrapper class.

An other issue is that since llvm::error_code was designed to return a fixed set of strings, there is no way to return a dynamic error message (e.g. "relocation 27 is invalid").

I'm pretty sure that the linker can specify a custom error_category with whatever information you want, and then return an error_code put with that. Michael would know for sure, but it is definitely in the design of error_code for it to be extensible.

>From what I can tell error_code is about wrapping out OS specific "int" error codes, and how to map those to static strings. I suppose you could abuse it by having a error_category which has a static vector or recently dynamically generated strings and the int "error code" is an index into that vector.

Clang has this great Diagnostics infrastructure for showing where the error occurred in the source file. Does developing something like that make sense for object files whose source is not human readable?

The chance that a user will get an invalid object file is incredibly
low, and if they do get one there's generally little they can do about
it. However when this does happen we need to be able to figure out
what's going on. So we do not need clang style diagnostic printing for
object file parsing, but we do still need to get useful diagnostics.

In asking around for clean models on how to do this better, I've heard two interesting ideas:

1) Incorporate the error_code and string message into the File class. That is the factory method always returns a File object. But there is a method on File objects to query if they are valid or not. If they are not, there is a method to get the error_code and/or error string.

I like this with some modifications.

2) A variation on this is to have a new subclass of File called FileError. If the Reader cannot construct a real FileFoo object because of some error, it instead constructs a FileError object and returns that. The FileError object has methods to get the error_code and/or error string.

I dislike this variant because a FileError is not a File.

With both these ideas the Reader method is:

virtual std::unique_ptr<lld::File> readFile(StringRef path);

which is much easier to use.

Any recommendations on a good way to design this?

Would an API like:

virtual std::unique_ptr<lld::File> readFile(StringRef path, error_code &ErrorInfo);

That is a common pattern, used for instance in the bit code reader. If we can unify the OS level error and the syntax/semantic errors from parsing object files, this pattern might work.

One other thought that leans me towards integrating the error code/msg into the File object is that it works cleaner when you start adding multithreading to the linker. When you have multiple threads going, each reading an object file and one of them returns an error, you need to cleaning shut down the other threads, and return the error. With the error status integrated into the File object, all you do is have each thread append it resulting File object onto a list, then after all threads are joined, run through the list once to see if any say they encountered an error. If the error_code is returned separately from the File, you have to create some data structure to bind together the error_code with which file path could not be parsed.

-Nick

I believe that specifically for object file parsing errors a
pair<error_code, vector<string> values> (or something similar) as a
member of File would work. This would be used as:

// In the error_category stuff
class blah_reader_error_category : public llvm::_do_message {
public:
  virtual std::string message(int ev) const {
    ...
    case uknown_reloc:
      return "uknown relocation % at %";
    ...
  }
};

// In the reader.

setError(make_pair(blah_error::unkown_reloc,
vector<string>{relocationumer, loc})); // loc could be calculated by
backtracking, or just the current file address. (we could even pass
the address as a stop address to obj2yaml).

// Someplace else.

if (File->error())
  o << File->getErrorMessage();

Which would do the string substitution.

This lets us not spend any time building up a representation for
diagnostics when none will be printed, and still get decent
diagnostics with well defined enum values when needed.

So:

virtual std::unique_ptr<lld::File> readFile(StringRef path);

class File {
public:
  bool error();
  error_code getError();
  string getErrorMessage();
protected:
  void setError(pair<error_code, vector<string>>);
private:
  pair<error_code, vector<string>> Error;
}

I believe we may also want to tablegen the different errors so it's
trivial to add a new one.

Note that for proper linker errors (undefined reference etc...) I
believe we need something more extensive.

- Michael Spencer

I don't know anything about lld::File. But I was just glancing at this and I started to wonder: What is the added benefit of unique_ptr here? Why not:

   virtual lld::File readFile(StringRef path);

?

I.e. lld::File can be move-only and manage its own resources. The only reason to use unique_ptr<lld::File> is if lld::File is a base class of what the unique_ptr actually points to. Or if you want to have a nullptr state that isn't easily integrated into lld::File itself. But it sounds like neither of these would be the case if lld::File incorporated its own error state.

Howard

But this interface has become awkward to use. There are two "return" values. This method is a unique_ptr "source" but the use of a by-refernce parameter means you have to start with an uninitialized unique_ptr and the readFile() has to std::move() into it. unique_ptr is much nicer to use as a return value.

Not a c++'11 expert here, but does a returning an std::tuple work?

I talked to Howard about that. It does work, but it is cumbersome. Either:

std::tuple<std::unique_ptr<lld::File>, error_code> result = reader.readFile(path);

Or you use std::tie() but then you still have to declare the two local variables and tie them together. You could define a wrapper class for the tuple, but then you realize it would be simpler to push the error_code into the File and get rid of the wrapper class.

Ok, yeah, that is annoying.

An other issue is that since llvm::error_code was designed to return a fixed set of strings, there is no way to return a dynamic error message (e.g. "relocation 27 is invalid").

I'm pretty sure that the linker can specify a custom error_category with whatever information you want, and then return an error_code put with that. Michael would know for sure, but it is definitely in the design of error_code for it to be extensible.

From what I can tell error_code is about wrapping out OS specific "int" error codes, and how to map those to static strings. I suppose you could abuse it by having a error_category which has a static vector or recently dynamically generated strings and the int "error code" is an index into that vector.

I'm pretty sure that error_code is designed to be "simple for errno's" but "general to other errors". Michael?

Clang has this great Diagnostics infrastructure for showing where the error occurred in the source file. Does developing something like that make sense for object files whose source is not human readable?

I don't think this makes sense for the linker. The key aspect of Clang's diagnostics (and llvm::SourceMgr, which is similar but simpler) is that it points back into source code. This is the source of most of the complexity of its diagnostics stuff, and doesn't make a lot of sense for the linker.

That said, one crazy hack that I've been wanting to do for some time in clang would apply equally well to the linker: all the command line options could be plopped into a text buffer, and diagnostics related to command line options could refer to *that* location. Imagine getting an error like this from the linker:

ld foo.o bar.o baz.a -o a.out

command-line:1:3: error: foo.o is not a valid object file
ld foo.o bar.o baz.a -o a.out
   ^~~~~

ld foo.o bar.o baz.a -o a.out

command-line:1:32: error: xyz.o member of baz.a is not a valid object file
ld foo.o bar.o baz.a -o a.out
               ^~~~~

ld foo.o bar.o baz.a -o a.out

command-line:1:46: error: can not open output file a.out: permission error
ld foo.o bar.o baz.a -o a.out
                        ^~~~~
or whatever.

From an API level, this would mean that the object file parsing logic would pass back up an error_code, and then the higher level driver logic would combine the string together with the location of the command line option for printing through llvm::SourceMgr.

One other thought that leans me towards integrating the error code/msg into the File object is that it works cleaner when you start adding multithreading to the linker. When you have multiple threads going, each reading an object file and one of them returns an error, you need to cleaning shut down the other threads, and return the error. With the error status integrated into the File object, all you do is have each thread append it resulting File object onto a list, then after all threads are joined, run through the list once to see if any say they encountered an error. If the error_code is returned separately from the File, you have to create some data structure to bind together the error_code with which file path could not be parsed.

As with the other guys, I'd really prefer to avoid merging the error status *into* the file. In principle, it is the "open file" operation that can fail, not the file itself.

-Chris

snip

Forgive me if I am way off base, but if this is a “C++11” library why can’t exceptions be used to signal the error?

Nathan

Forgive me if I am way off base, but if this is a "C++11" library why can't
exceptions be used to signal the error?

The reasons for exception avoidance in the LLVM codebase hold across
C++98 and C++11 equally (
http://llvm.org/docs/CodingStandards.html#ci_rtti_exceptions )

- David

In lld we have a Reader class which is a factory that takes .o file(s) and produces an in memory lld::File object(s). But in doing so, there could be I/O errors (file not readable) or file may be malformed. We are also using C++11 in lld, so we use std::unique_ptr for managing object ownership.

The Reader class currently has an interface that can be simplified down to:

virtual error_code readFile(StringRef path, std::unique_ptrlld::File &result);

But this interface has become awkward to use. There are two “return” values. This method is a unique_ptr “source” but the use of a by-refernce parameter means you have to start with an uninitialized unique_ptr and the readFile() has to std::move() into it. unique_ptr is much nicer to use as a return value.

Not a c++'11 expert here, but does a returning an std::tuple work?
I talked to Howard about that. It does work, but it is cumbersome. Either:

std::tuple<std::unique_ptrlld::File, error_code> result = reader.readFile(path);

I suggest an error_or class that can be combined with any result type, not just File. It could be implemented with a fully generic variant class, but the interface would likely be better with some specialization to errors.

In asking around for clean models on how to do this better, I've heard two interesting ideas:

1) Incorporate the error_code and string message into the File class. That is the factory method always returns a File object. But there is a method on File objects to query if they are valid or not. If they are not, there is a method to get the error_code and/or error string.

I like this with some modifications.

2) A variation on this is to have a new subclass of File called FileError. If the Reader cannot construct a real FileFoo object because of some error, it instead constructs a FileError object and returns that. The FileError object has methods to get the error_code and/or error string.

My intuition is the opposite here. If you add the error information to the lld::File base class then leads you to write the constructor for FileFoo() which does all the parsing, and if it runs into a problem, just sets the error ivar and returns (your FileCOFF is an example of that). The problem this leads to is half constructed objects. That can trip up a client which might accidentally call methods on the half constructed object other than the error getting methods. It also means the destructor has to be careful when tearing down the half constructed object.

On the other hand making a new subclass of lld::File called lld::FileError leads you to a different style of programming. You have to open and parse the file past the point that any error can occur before you construct the FileFoo object. If anything goes wrong up to the point, you instead construct a FileError object and return that.

I dislike this variant because a FileError is not a File.

I'm not sure what you mean by "not a File". It is a subclass of lld::File. Its path() method returns the file system path that tried to be used.

Think of it as a proxy. The real file could not parsed. It represents that state.

With both these ideas the Reader method is:

  virtual std::unique_ptr<lld::File> readFile(StringRef path);

which is much easier to use.

Any recommendations on a good way to design this?

Would an API like:

  virtual std::unique_ptr<lld::File> readFile(StringRef path, error_code &ErrorInfo);

That is a common pattern, used for instance in the bit code reader. If we can unify the OS level error and the syntax/semantic errors from parsing object files, this pattern might work.

One other thought that leans me towards integrating the error code/msg into the File object is that it works cleaner when you start adding multithreading to the linker. When you have multiple threads going, each reading an object file and one of them returns an error, you need to cleaning shut down the other threads, and return the error. With the error status integrated into the File object, all you do is have each thread append it resulting File object onto a list, then after all threads are joined, run through the list once to see if any say they encountered an error. If the error_code is returned separately from the File, you have to create some data structure to bind together the error_code with which file path could not be parsed.

-Nick

I believe that specifically for object file parsing errors a
pair<error_code, vector<string> values> (or something similar) as a
member of File would work. This would be used as:

// In the error_category stuff
class blah_reader_error_category : public llvm::_do_message {
public:
virtual std::string message(int ev) const {
   ...
   case uknown_reloc:
     return "uknown relocation % at %";
   ...
}
};

// In the reader.

setError(make_pair(blah_error::unkown_reloc,
vector<string>{relocationumer, loc})); // loc could be calculated by
backtracking, or just the current file address. (we could even pass
the address as a stop address to obj2yaml).

// Someplace else.

if (File->error())
o << File->getErrorMessage();

Which would do the string substitution.

What is the advantage of this delayed substitution? The site creating the error needs to convert all values to strings. Your example has two numbers (reloc value and address) which would need to be converted to strings. It would be simpler all around to just generate the complete string right there at the error site. Are you trying to support localization? or error numbers?

Also, are these enum values per file format (i.e. does ELF and mach-o each define their own set)? Or is there one big set of enums for use by all Readers?

This lets us not spend any time building up a representation for
diagnostics when none will be printed, and still get decent
diagnostics with well defined enum values when needed.

What is the rule of thumb of when a enum value is needed (as opposed to success/failure)? My thinking is that an enum only makes sense if the client can do something about a specific enum value. For instance in some future IDE, maybe file_not_found error can prod the IDE to recompile something to regenerate the object file. Another reason for enums is if you want a way to localize error strings to different languages. The enums are an index into a per-localization string table. Given that these error (e.g. unknown relocation) are a bug in the compiler or linker and not a problem the user of the tools can remedy, I'm not sure they need localizing.

-Nick

In asking around for clean models on how to do this better, I've heard two interesting ideas:

1) Incorporate the error_code and string message into the File class. That is the factory method always returns a File object. But there is a method on File objects to query if they are valid or not. If they are not, there is a method to get the error_code and/or error string.

I like this with some modifications.

2) A variation on this is to have a new subclass of File called FileError. If the Reader cannot construct a real FileFoo object because of some error, it instead constructs a FileError object and returns that. The FileError object has methods to get the error_code and/or error string.

My intuition is the opposite here. If you add the error information to the lld::File base class then leads you to write the constructor for FileFoo() which does all the parsing, and if it runs into a problem, just sets the error ivar and returns (your FileCOFF is an example of that). The problem this leads to is half constructed objects. That can trip up a client which might accidentally call methods on the half constructed object other than the error getting methods. It also means the destructor has to be careful when tearing down the half constructed object.

Makes sense. You can't parse atoms as they are requested anyway, so
it's fine to do it all up front.

On the other hand making a new subclass of lld::File called lld::FileError leads you to a different style of programming. You have to open and parse the file past the point that any error can occur before you construct the FileFoo object. If anything goes wrong up to the point, you instead construct a FileError object and return that.

I dislike this variant because a FileError is not a File.

I'm not sure what you mean by "not a File". It is a subclass of lld::File. Its path() method returns the file system path that tried to be used.

Think of it as a proxy. The real file could not parsed. It represents that state.

Yes it is a subclass of File, but it does not really represent one.
You cannot get its atoms, and you have to do a special check of the
run-time type, just like the special check for failed() above.

With both these ideas the Reader method is:

virtual std::unique_ptr<lld::File> readFile(StringRef path);

which is much easier to use.

Any recommendations on a good way to design this?

Would an API like:

virtual std::unique_ptr<lld::File> readFile(StringRef path, error_code &ErrorInfo);

That is a common pattern, used for instance in the bit code reader. If we can unify the OS level error and the syntax/semantic errors from parsing object files, this pattern might work.

One other thought that leans me towards integrating the error code/msg into the File object is that it works cleaner when you start adding multithreading to the linker. When you have multiple threads going, each reading an object file and one of them returns an error, you need to cleaning shut down the other threads, and return the error. With the error status integrated into the File object, all you do is have each thread append it resulting File object onto a list, then after all threads are joined, run through the list once to see if any say they encountered an error. If the error_code is returned separately from the File, you have to create some data structure to bind together the error_code with which file path could not be parsed.

-Nick

I believe that specifically for object file parsing errors a
pair<error_code, vector<string> values> (or something similar) as a
member of File would work. This would be used as:

// In the error_category stuff
class blah_reader_error_category : public llvm::_do_message {
public:
virtual std::string message(int ev) const {
...
case uknown_reloc:
return "uknown relocation % at %";
...
}
};

// In the reader.

setError(make_pair(blah_error::unkown_reloc,
vector<string>{relocationumer, loc})); // loc could be calculated by
backtracking, or just the current file address. (we could even pass
the address as a stop address to obj2yaml).

// Someplace else.

if (File->error())
o << File->getErrorMessage();

Which would do the string substitution.

What is the advantage of this delayed substitution? The site creating the error needs to convert all values to strings. Your example has two numbers (reloc value and address) which would need to be converted to strings. It would be simpler all around to just generate the complete string right there at the error site. Are you trying to support localization? or error numbers?

The point is supporting making decisions based on enum values while
still having dynamic strings. For instance, in the case of a valid
structure but unknown value, the linker could use obj2yaml to dump the
object file in human readable format up to the point where the unknown
value occurred. In the case of the yaml reader we actually have real
diagnostics we can give.

Also, are these enum values per file format (i.e. does ELF and mach-o each define their own set)? Or is there one big set of enums for use by all Readers?

A general set plus per format.

This lets us not spend any time building up a representation for
diagnostics when none will be printed, and still get decent
diagnostics with well defined enum values when needed.

What is the rule of thumb of when a enum value is needed (as opposed to success/failure)? My thinking is that an enum only makes sense if the client can do something about a specific enum value. For instance in some future IDE, maybe file_not_found error can prod the IDE to recompile something to regenerate the object file. Another reason for enums is if you want a way to localize error strings to different languages. The enums are an index into a per-localization string table. Given that these error (e.g. unknown relocation) are a bug in the compiler or linker and not a problem the user of the tools can remedy, I'm not sure they need localizing.

Yes, an additional enum value is only needed when a distinction can be
made by higher level code. Localization is not a goal of this.

Also, by full representation I meant enough to actually go tell the
user how to edit the binary file to fix the parse error. We clearly
shouldn't be doing that. But even in the case of compiler bugs we
should still give good diagnostics for the people who get the bug
reports, and so the user doesn't feel totally lost.

-Nick

I like the error_or<T> idea.

- Michael Spencer

I’m not sure what you mean by “not a File”.
I think he means it in the sense of the Liskov Substitution Principle <http://en.wikipedia.org/wiki/Liskov_substitution_principle>, not as in lld::FileError literally “does not inherit” from lld::File.

–Sean Silva

I also really like the variant design. It has a few important advantages IMO:

  1. No pointers or heap allocations. These make code more bug prone and can needlessly hurt optimizers. (although the latter is hopefully not relevant in this specific case)

  2. No “error” state in the file object. Either you have a fully constructed object or you don’t.

  3. An explicit API to check and instrument whether an error has occured. For example, it is harder to “forget” to check the error, and carry on manipulating the file. The implementation can assert in debug builds to catch this early, etc.

  4. Doesn’t require polymorphic types (but they can be supported if needed). Essentially, you can wrap a specific file type, rather than being forced to use some generic file type.

One interesting possibility is that you can grow the same pattern to different levels of complexity as needed (and thus keep it as simple as possible as long as possible):

error_or – wraps the existing error_code with a nice interface
linker_error_or – wraps both an error_code and some linker specific error handling logic
file_error_or – provides completely custom error information specific to file-loading-errors.

I’m not saying which (if any) of these would be appropriate, just pointing out that this pattern can be followed to greater and lesser degrees of specificity.

>
> >
> >> In lld we have a Reader class which is a factory that takes .o file(s) and produces an in memory lld::File object(s). But in doing so, there could be I/O errors (file not readable) or file may be malformed. We are also using C++11 in lld, so we use std::unique_ptr for managing object ownership.
> >>
> >> The Reader class currently has an interface that can be simplified down to:
> >>
> >> virtual error_code readFile(StringRef path, std::unique_ptr<lld::File> &result);
> >>
> >> But this interface has become awkward to use. There are two "return" values. This method is a unique_ptr "source" but the use of a by-refernce parameter means you have to start with an uninitialized unique_ptr and the readFile() has to std::move() into it. unique_ptr is much nicer to use as a return value.
> >
> > Not a c++'11 expert here, but does a returning an std::tuple work?
> I talked to Howard about that. It does work, but it is cumbersome. Either:
>
> std::tuple<std::unique_ptr<lld::File>, error_code> result = reader.readFile(path);
>

I suggest an error_or<T> class that can be combined with any result type, not just File. It could be implemented with a fully generic variant class, but the interface would likely be better with some specialization to errors.

I too think that error_or<T> is better than std::pair<error_code, T>:
* more compact
* can name fields (e.g. error, object instead of first, second)
* can embed asserts that object field is not accessed if there is an error.

But there are some drawbacks:
* lld uses C++11, so we thought we'd add explicit ownership by using std::unique_ptr. That means we'd need to layer the templates (e.g. error_or<std::unique_ptr<lld::File>>), or forgo using unique_ptr, or cause error_or to always internally use unique_ptr.
* We still have not figured out how to encapsulate errors for object file parsing. The last thread was an enum *and* and a string.

I also really like the variant design. It has a few important advantages IMO:

1) No pointers or heap allocations. These make code more bug prone and can needlessly hurt optimizers. (although the latter is hopefully not relevant in this specific case)

2) No "error" state in the file object. Either you have a fully constructed object or you don't.

I do agree that either you successfully parsed the file and have a full File object or you failed and have some sort of error. That means they are mutually exclusive, so having every File object carry an error_code is a waste and confusing. But, that also make the case for the FileError subclass...

4) Doesn't require polymorphic types (but they can be supported if needed). Essentially, you can wrap a *specific* file type, rather than being forced to use some generic file type.

Yes, the template lets you do that. But in the specific case of the linker, it will always be using the generic type. You can pass the linker paths to object files, shared libraries, or static libraries. The linker passes those paths onto the readFile() method which therefore must return the generic File type.

3) An explicit API to check and instrument whether an error has occured. For example, it is harder to "forget" to check the error, and carry on manipulating the file. The implementation can assert in debug builds to catch this early, etc.

The same assert functionality can be added to the FileError design too. About the only thing you can do with a File is iterator its Atoms. The FileError class can implement its iterators to assert. Also, it is already the case that the linker handles libraries and object files differently, yet they both come back from readFile() as a generic File object. The linker already has to sort all the File objects by kind, so noticing a FileError in the mix not a problem.

I've noticed resistance to the FileError design. It seems natural to me having recently done some work on the Darwin linker to support threaded and pipelined linking. The idea there is that you create an object immediately that will eventually become fully populated with Atoms. But the object is first in an incomplete state until 1) the compiler is done producing the .o file, and 2) the Reader has parsed the .o file into atoms. The linker main thread creates these incomplete objects and then waits for them to transition to the complete state (with atoms) or to an error state (if some parsing problem happened).

Instead of thinking of FileError as a degenerate File with an error_code slapped on. Think of FileError as an object encapsulating the attempt to read and parse the file. There could be a rich interface to query it about what went wrong. The error_code design attempts to capture everything about a failure in an 'int'. The FileError class defines a rich object interface to the error. Perhaps it could print out a range of bytes from the file and highlight which bits were bad. Or perhaps it could dump some context showing all the sections or all the symbols and then why some bits are invalid within that context. In other words, the error reporting is open ended with a FileError class.

-Nick

>
> >
> >> In lld we have a Reader class which is a factory that takes .o file(s) and produces an in memory lld::File object(s). But in doing so, there could be I/O errors (file not readable) or file may be malformed. We are also using C++11 in lld, so we use std::unique_ptr for managing object ownership.
> >>
> >> The Reader class currently has an interface that can be simplified down to:
> >>
> >> virtual error_code readFile(StringRef path, std::unique_ptr<lld::File> &result);
> >>
> >> But this interface has become awkward to use. There are two "return" values. This method is a unique_ptr "source" but the use of a by-refernce parameter means you have to start with an uninitialized unique_ptr and the readFile() has to std::move() into it. unique_ptr is much nicer to use as a return value.
> >
> > Not a c++'11 expert here, but does a returning an std::tuple work?
> I talked to Howard about that. It does work, but it is cumbersome. Either:
>
> std::tuple<std::unique_ptr<lld::File>, error_code> result = reader.readFile(path);
>

I suggest an error_or<T> class that can be combined with any result type, not just File. It could be implemented with a fully generic variant class, but the interface would likely be better with some specialization to errors.

I too think that error_or<T> is better than std::pair<error_code, T>:
* more compact
* can name fields (e.g. error, object instead of first, second)
* can embed asserts that object field is not accessed if there is an error.

But there are some drawbacks:
* lld uses C++11, so we thought we'd add explicit ownership by using std::unique_ptr. That means we'd need to layer the templates (e.g. error_or<std::unique_ptr<lld::File>>), or forgo using unique_ptr, or cause error_or to always internally use unique_ptr.

Could you explain this a little further? It doesn't quite make sense
to me. In my (C++11) experience I'd only use a unique_ptr for a
polymorphic type that needs single ownership. For a non-polymorphic
single ownership without any valid copy semantics I would use a
move-only type rather than std::unique_ptr.

- David

I think he is suggesting that File is always going to be a polymorphic object.

>
> >
> >> In lld we have a Reader class which is a factory that takes .o file(s) and produces an in memory lld::File object(s). But in doing so, there could be I/O errors (file not readable) or file may be malformed. We are also using C++11 in lld, so we use std::unique_ptr for managing object ownership.
> >>
> >> The Reader class currently has an interface that can be simplified down to:
> >>
> >> virtual error_code readFile(StringRef path, std::unique_ptr<lld::File> &result);
> >>
> >> But this interface has become awkward to use. There are two "return" values. This method is a unique_ptr "source" but the use of a by-refernce parameter means you have to start with an uninitialized unique_ptr and the readFile() has to std::move() into it. unique_ptr is much nicer to use as a return value.
> >
> > Not a c++'11 expert here, but does a returning an std::tuple work?
> I talked to Howard about that. It does work, but it is cumbersome. Either:
>
> std::tuple<std::unique_ptr<lld::File>, error_code> result = reader.readFile(path);
>

I suggest an error_or<T> class that can be combined with any result type, not just File. It could be implemented with a fully generic variant class, but the interface would likely be better with some specialization to errors.

I too think that error_or<T> is better than std::pair<error_code, T>:
* more compact
* can name fields (e.g. error, object instead of first, second)
* can embed asserts that object field is not accessed if there is an error.

But there are some drawbacks:
* lld uses C++11, so we thought we'd add explicit ownership by using std::unique_ptr. That means we'd need to layer the templates (e.g. error_or<std::unique_ptr<lld::File>>), or forgo using unique_ptr, or cause error_or to always internally use unique_ptr.

Consider *all* the C++11 features, not just unique_ptr. As Howard
suggested at http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-June/050867.html,
in C++11, I would probably make lld::File a move-only type rather than
asking users to manage its lifetime by wrapping it in unique_ptr.
error_or<lld::File> should work fine in that model.

If lld::File is an abstract base class, then you need the unique_ptr
... but if File is used enough you might win by writing a
type-erasing, move-only wrapper. Talk to Howard if you need help doing
that.

Even if the type-erasing wrapper isn't worth doing, layered templates
aren't the end of the world. They have the benefit that users can see
exactly what they're getting. If you prefer to hide the details of the
type, a typedef may be enough.

* We still have not figured out how to encapsulate errors for object file parsing. The last thread was an enum *and* and a string.

Yes, you may need to iterate on exactly what fields error_or<T>
contains when it's representing an error. I wouldn't assume it can
just be a std::error_code.

Instead of thinking of FileError as a degenerate File with an error_code slapped on. Think of FileError as an object encapsulating the attempt to read and parse the file.

Names are important and worth spending time on. If you want people to
think of a class as a degenerate File with an error_code slapped on,
name it "FileError" (actually, "FileError" just sounds like the
error_code part, without the File). If you want people to think of it
as the result of an attempt to read and parse a file, name it
something related to reading and parsing a file.

HTH,
Jeffrey