Getting clang-format to run in a web browser

Scott Meyers has blogged a few times about his experience publishing technical books to ebook formats, and a number of times the subject of formatting code for e-readers has come up. The quite obvious solution is automatic code formatting and there have been several commenters to whom clang-format immediately suggested itself. I decided it sounded like a fun evening project so tonight that's what I did, and I thought I'd share what it took to get my initial working example.

I started with my existing LLVM build environment, which already has LLVM, compiler-rt, libcxx, lld, clang, and the clang tools including clang-format set up appropriately for building from source. I use CMake/Ninja and have a buildbot set up with OS X and Windows slaves to automate daily builds and test runs. So from there I grabbed the latest release of Emscripten, the C++ to Javascript compiler (which coincidentally uses LLVM as a backend), and followed the instructions to set up the 'portable' install for OS X. After wondering for a bit why Emscripten is so adamant that the python executable be named 'python2', I finished the setup and was able to build a hello-world.cpp program and run it in a browser.

After that I set up a new CMake build directory for the Emscripten build of clang-format to go in. It took a few tries, but the magic incantation to produce a functioning build involved using Emscripten's binaries for C++ compiler, C compiler, ar, and ranlib. Overriding the default linker was not needed and in fact stops the CMake configure from working. I was also require to set C++11 mode using CMAKE_CXX_FLAGS, and I disabled a warning here as well to cut down on the noise. I chose to configure a release build. The complete CMake invocation I used was:

cmake -DCMAKE_CXX_FLAGS="-std=c++11 -Wno-warn-absolute-paths" -DCMAKE_CXX_COMPILER=<emscripten_binary_path>/emcc -DCMAKE_C_COMPILER=<emscripten_binary_path>/emcc -DCMAKE_AR=<emscripten_binary_path>/emar -DCMAKE_RANLIB=<emscripten_binary_path>/emranlib -DCMAKE_BUILD_TYPE=release -G Ninja <path_to_my_existing_llvm_source_tree>

I didn't bother with this, but adding -DCMAKE_C_FLAGS="-Wno-absolute-paths" might also be good, to cut out the last few warnings.

Additionally I had to make one change to the CMakeLists.txt file in compiler-rt, where it was complaining about requiring a pointer size of 4 or 8 bytes. I simply commented out the error line in the CMakeLists.txt.

At this point CMake was successfully configuring a build directory, and I was able to kick off a build with 'ninja clang-format'.

The next issue was that LLVM's build process involves producing executables that then actually have to be run as part of the build. This was easy enough to get around by the simple expedient of using the executables from my normal, non-Emscripten build. After a build step requiring an executable would fail, causing the build to stop, I would copy the appropriate executable from my regular build area into the Emscripten build area. I also needed to set execute permissions on the copied executables. After that I would restart the build with another "ninja clang-format" invocation. There were only two restarts required, and the two executables needed were llvm-tblgen and clang-tblgen.

The build then completed, producing a file 'clang-format' containing LLVM bitcode. Emscripten's compiler, emcc, requires a file extension to figure out what kind of file it is in order to figure out what to do with it, so I renamed the file to 'clang-format.o'. emcc also uses a file extension on the output file to figure out what to produce. If you ask emcc to produce an html file emcc will create an web page from a template, and the page is set to automatically load and run the final javascript program.

I found that trying to use stdin in the final program produces an endless series of dialogs asking for input in the web browser (So be sure not to load up such an html page in a browser like Safari which lacks a handy "Prevent this web page from spawning more dialogs" button). In order to avoid stdin, I used emcc's preload-file feature to put files into a virtual filesystem available to the running javascript program. The final emcc invocation looked like this:

emcc clang-format.o -o blah.html --preload-file main.cpp

emcc produced a few 'unresolved symbol' warnings, but still generated runnable javascript.

In order to get clang-format to actually look at the loaded file I had to modify the generated html file in order to pass command line arguments to clang-format. This involved finding the var 'Module' and adding an 'arguments' parameter. I added it between the preRun and postRun members:

      var Module = {
        preRun: [],
        arguments: ['main.cpp'], // <--- added this line
        postRun: [],

And the final result:

http://i.imgur.com/x3xgpK9.png

All in all it took about 3 hours I think, and the experience getting Emscripten to build the necessary parts of LLVM using CMake was pretty smooth. The resulting javascript file is ~20MB, which seems a bit heavy to include in an ebook, but I think this still indicates that this could be a realistic solution to the problem of publishing code samples in a dynamic format.

- Seth

Hey Nick, this sounds familiar…

Hey Nick, this sounds familiar...

Wow. In a completely bizarre cosmic coincidence, at the same time that you
were working on building clang-format with emscripen, so was I. I didn't
use cmake, I just ran:

em++ -Iinclude -Itools/clang/include -D__STDC_LIMIT_MACROS
-D__STDC_CONSTANT_MACROS -x c -std=gnu99 lib/Support/*.c -x c++
-std=gnu++11 lib/Support/*.cpp tools/clang/lib/Basic/*.cpp
tools/clang/lib/Lex/*.cpp tools/clang/lib/Tooling/Core/*.cpp
tools/clang/lib/Format/*.cpp -fno-rtti -fno-exceptions -s
EXPORTED_FUNCTIONS='["_reformat"]' -O3 --memory-init-file 0 --llvm-lto 1 -o
reformat.js

after making a few patches, such as turning off ZLIB, TERMINFO, TERMIOS,
WRITEV and ENABLE_THREADS in include/llvm/Config/. I also turned off both
LLVM_ON_WIN32 and LLVM_ON_UNIX, which required me to implement a few stub
funcitons. You may notice from the above command that I don't actually
build the clang-format tool, just the library. I wrote a little stub in
tools/clang/lib/Format which looks like this:

#include "llvm/Support/TimeValue.h"
#include "clang/Format/Format.h"

#include <cstring>
#include <string>

namespace llvm {
namespace sys {
  void AddSignalHandler(void (*)(void*), void*) {}
  void RunInterruptHandlers() {}

  TimeValue TimeValue::now() {
    return TimeValue(INT64_MIN, 0);
  }
}
}

extern "C" const char *reformat(const char *input) {
  auto style =
      clang::format::getGoogleStyle(clang::format::FormatStyle::LK_Cpp);
  clang::tooling::Range everything(0, strlen(input));
  clang::tooling::Replacements replacements =
      clang::format::reformat(style, input, {everything});

  std::string output(input);
  for (auto I = replacements.rbegin(), E = replacements.rend(); I != E;
++I) {
    output.replace(I->getOffset(), I->getLength(), I->getReplacementText());
  }

  return strdup(output.c_str());
}

The llvm::sys stuff is the minimal lib/Support code you need to get
lib/Format working once disabling both platforms.

Calling it from JS looks like:

var output = Module.ccall('reformat', 'string', ['string'], ['void foo(int
i,int j) {return i+j+ i ;}\nvoid bar() ;']);

I haven't checked whether it leaks or not. Also, my .js file is 3.8MB, as a
standalone blob with no .mem file.

Nick