[RFC] Moving away from epydoc for the LLDB SB API documentation

Hi all,

some of you might have noticed that the Python SB API documentation on the website hasn’t been regenerated since more than a year. I pinged Andrei about fixing the broken generation script. While I was trying to recreate the doc generation setup on my machine I noticed that our documentation is generated by epydoc which is unmaintained and had its last release 12 years ago. It seems we also have some mocking setup for our native _lldb module in place that stopped working with Python 3.

While the setup we currently have will probably work for a bit longer (assuming no one upgrades the web server generating the API docs), I would propose we migrate to a newer documentation generator. That would not only make this whole setup a bit more future-proof but we probably could also make the API docs a more user-friendly while we’re at it.

From what I can see we have at least three alternative generators:

  1. pydoctor (epydoc’s maintained fork) - Example LLDB docs: https://teemperor.de/pub/pydoctor/index.html
    Pros:
  • Doesn’t really change the user-experience compared to epydoc.
    Cons:
  • Doesn’t really change the user-experience compared to epydoc.- The website is rather verbose and you need to click around a lot to find anything.

  • Horrible user-experience when viewed on mobile.

  • No search from what I can see.

  • It seems we can’t filter out certain types we don’t care about (like Swig generated variables/wrappers etc.)

  • It doesn’t include LLDB’s globals/enum values in the API (even when I manually document them in the source). This seems to be just a Python thing that opinions are split on how/if globals are supposed to be documented.

  • Somehow ignores certain doc strings (I assume it fails to parse them because of the embedded code examples).

  1. sphinx (which is also generating the rest of the LLVM websites) - Example LLDB docs: https://teemperor.de/pub/sphinx/index.html
    Pros:
  • The most flexible alternative, so we potentially could fix all the issues we have if we spend enough time implementing plugins.
  • We already use sphinx for generating the website. We however don’t use its autodoc plugin for actually generating documentation from what I can see.
    Cons:
  • The two plugins I tried for autogenerating our API are hard to modify for our needs (e.g. to implement filters for SWIG generated vars/wrappers).
  • In general sphinx is much better if we would hand-write dedicated Python documentation files, but I don’t think we want to do that.
  • LLDB’s global variables are displayed but for some reason getting assigned the doc string of __int__?
  1. pdoc3 (dedicated Python API generator) - Example LLDB docs: https://teemperor.de/pub/pdoc.html
    Pros:
  • Straightforward to modify pretty much every part of the documentation to our needs (the example is created with a slightly modified config):
  • Dedicated docs for single-module APIs, so we don’t have all the awkward boilerplate text concerned with modules when we only have one ‘lldb’ module in our API.

Cons:

  • It only shows global variables that are documented. However, SWIG doesn’t seem to support generating documentation for globals (?). We can work around that by having a script assign all our globals/enum a dummy doc string before generating the docs (that’s what I do in the example)
  • Generates a single page with HTML anchors (might also be a good thing as you can now always Ctrl+F for identifiers and it’s much faster to generate than the others).

I think we can all agree that this topic is great bikeshedding material, so this mail thread shall be the official RFC thread where everyone can voice their opinion about how our Python API docs should look like.

I’ll make the start and say that I think pdoc3 is the way to go. The generated web page feels great to use and it’s straightforward to add all the custom filters we need to get rid of SWIG-generated code. Also the only bug we need to fix here has a simple workaround (assign our defines/enum/etc. dummy strings via some script).

Cheers,

  • Raphael

Based on looks alone, your Sphinx example feels the most polished to me. And it’d be consistent with the main LLVM docs, which is nice. However, the pdoc3 feels much more usable (easier to skim through, I love the one-pager-ness of it), so that’s where my vote is going too.

It’d be nice if the pdoc3 had anchors on the right side, e.g. if you’re skimming through and find something you want to link, you can do so without having to look it up again on the left. Many doc systems (including the main LLDB docs) have a “¶” symbol that appears next to each header when hovering for this. Also, the UI feels excessively large/bulky, it’d be nice to make it more compact. Both these things seem like minor issues that could be tweaked – if pdoc3 doesn’t already support it, it probably isn’t too hard to send a patch for.

Another issue with epydoc is that it currently doesn’t list properties. The checked-in documentation from the old days had them, but I never got epydoc to generate them (and to be fair I never really tried). Instead I looked at alternatives as well. The main issue I found is that it’s easy to trick epydoc (see lldb/docs/CMakeLists.txt) into parsing the bindings without actually needing liblldb to be built, which is out of the question for the server that renders the docs. All the other alternatives I tried would attempt to do an import lldb which would obviously fail without the dylib. More important things came up and I never really followed up on this. Maybe it’s easy to hack around that (but please no static bindings), but I think it’s an important thing to consider.

I feel very similar to Jordan. I like Sphix because it’s already used by LLVM and LLDB, but unless it doesn’t require a separate plugin I’m not sure how much that really matters. The biggest pro for me is that it looks and feels like a lot of existing Python documentation. That said, pdoc3 looks and feels a bit nicer, and it seems to be around for a while and actively developed. I’m pretty indifferent between the two, so as a tie-breaker I’d go with the one that requires the least amount of modification.