Sean, thanks for reminding this, Alp did commit a class derived from raw_svector_ostream conatining an internal SmallString he called small_string_ostream. The commit was reverted after a day due to a disagreement about the commit approval and apparently abandoned.
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html
The class Alp comitted did solve the possible mismatch between the SmallString and the stream by making the SmallString private to the class. This however did not treat the root problem, the duplication of the information about the buffer between SmallString and the stream.
I can make performance measurements but even if the performance difference is neglible (and looking at all the code paths and conditionals of non-inlined functions at raw_ostream.cpp that’s hard to believe), the current SmallString-raw_svector_ostream is simply wrong.
Personally, after the previous “Alp” discussion I found and fixed several such issues in my out-of-tree code only to make new similar new error earlier this year, which I caught only months after, when Rafael committed the pwrite which reminded me the issue. Due ot the information duplication Rafael commit also had a bug and Mehdi reports similar experience. Back then Alp reported similar problems he found in LLVM code which were hopefully fixed otherwise.
In addition to the information duplication, the design is quite confusing to use
- Should one use the stream.str() function or the SmallString itself?
- flush or str?
- How do you properly clear the combination for reuse (that was my mistake, I forgot to resync after OS.clear())?
It’s safe to say this design pattern is very easy to get wrong one way or another, we got burned by it multiple times and it should be replaced.
Alp suggested a derived class containing its own SmallString. That’s a simple and effective approach to avoid the bugs mentioned but does not fix the memory or runtime issues. Small as they may be, why have them at a fundemental data structure?
I was thinking about going further avoiding all duplication with a templated class derived with its own internal buffer storage.
Rafael suggested a more modular approach, a derived adpater class adapter to a simple buffer (or nullptr for fully-dynamic operation) which also won’t duplicate any information the buffer is dumb and has no state.
That solution sounds simple, efficient and safe to use. The implementation would be actually simpler then raw_svector_ostream since all the coordination logic is not required anymore.