[RFC] Make MyST Markdown the LLVM docs format, RIP reST

TL;DR: We should migrate from reStructuredText (reST or rst) to Markedly Structured Text (MyST) and make myst_parser a hard dependency for building LLVM documentation with Sphinx.

If you’ve been on the internet in the last 10 years, you know that Markdown is the ubiquitous, default way to format plain text. reST has a deep, extensible feature set, but simplicity has won out. The most useful docs are the ones that exist, and are easy to update. Every IDE under the sun, such as VS Code, IntelliJ, NeoVim, etc, supports rendering Markdown dialects live in some way. reST has served us well, but I believe that now is the time set a long-term goal to migrate our docs to Markdown.

Since 2018 (D44910), LLVM has used a Markdown dialect called Markedly Structured Text (MyST) for portions of its documentation. Individual subprojects have effectively been free to choose between reST and MyST at their own discretion, and there has been no coherent policy about which is preferred.

Newer projects have tended to prefer markdown. MLIR is entirely Markdown, Flang is almost entirely Markdown, and LLDB is substantially Markdown. The LLVM and Clang docs are mainly in .rst files.

This has led to backporting the CIR docs to reST. The CIR docs were originally Markdown, but were converted back to reST format because it was still documented as the primary way to write LLVM docs. The point of this RFC is to declare affirmatively which format we prefer, update the Sphinx quickstart template to that effect, and make a full migration the desired end state.

Proposal

  • MyST Markdown should become the preferred and eventual sole format for LLVM Sphinx documentation.
  • New documentation should use .md unless there is a concrete blocker.
  • Existing .rst files may continue to be edited until they are converted, but we should welcome mechanical conversion PRs.
  • myst_parser should become a hard dependency for Sphinx documentation builds, which appears to affect the man-page builder.
  • The Sphinx quickstart template should recommend Markdown for new docs. I’ll also update it with some migration tips, since I’ve started to dig into that.

I can commit to migrating a few key documents, but I can’t sign up to rewrite all reST documentation. I’m hoping that, in true open source fashion, volunteers will pitch in and help migrate their own docs and help review and approve mechanical conversion PRs. These are the docs I plan to migrate, in this order, one PR at a time:

  • SphinxQuickstartTemplate: This is effectively our policy doc, so it goes first as an obvious demo of how to write new docs.
  • LangRef: The most important doc. The edits must not reflow text needlessly to avoid conflicts with pending patches.
  • DeveloperPolicy: Also an important doc.
  • CMake: Next most important doc.

I’ve actually already prototyped the migration on GitHub in rnk:llvm-markdown, and I am serving up a copy of the generated documents temporarily here (starting with the quickstart), if you want to do a side-by-side comparison.

14 Likes

I’m supportive of this proposal. As Reid points out, most new documentation in LLDB is using Markdown. I’m happy to volunteer to help convert the rest.

2 Likes

I’m in favor of this.

I don’t think there is anything in reST that we currently use that we cannot also do in Markdown with much greater simplicity. reST has a lot of annoyances like requiring titles to have the correct number of equal signs in the line below and is just generally a lot more difficult to use. Markdown is trivial and reST is not.

The only real concern that I can think of would be churn, but the only real place I would care about that is in LangRef for git blame. But if you’re designing the edits to not reflow text to ensure even current patches won’t run into to conflicts, I don’t think there will be any major issues. I still run into the HTML to reST conversion commit in LangRef occasionally, and it sounds like that was much more disruptive than this will be.

1 Like

+1 to that. This needs to be a central requirement of any mechanical translations.

No strong objections here. My main contact with the .rst files are the Command Guide documents for the various tools under llvm/tools. The only feature from these that I care about are the ability to reference other sections and options within the text without having to add explicit links or anchors in the text. For example:

.. option:: --target <format>, -F

 Equivalent to :option:`--input-target` and :option:`--output-target` for the
 specified format. See `SUPPORTED FORMATS`_ for a list of valid ``<format>``
 values.

This results in links created to the --input-target and --output-target options and the SUPPORTED FORMATS subsection.

I have no idea if we’d have equivalent functionality in Markdown automatically (I’m not familiar with it in depth or how it plays with Sphinx).

Thank you for the RFC! Just to be sure I understand the shape of things: Sphinx will continue to be used to build docs, but the docs will be written in markdown rather than rst? (So, for example, we don’t have to worry about questions like “how to build manpages” because that’s handled by Sphinx? No need to introduce a pandoc dependency?)

1 Like

I honestly don’t see what this RFC is changing, other than: “New documentation should use .md unless there is a concrete blocker.”.

The rest is sort of already the state of things?

Until the existing stuff is converted, we are in exactly our current state: Needing both types of files in tree/generating both types in tree.

IMO, any such proposal to switch us over needs to come with a REALLY good conversion, even if it is just a mechanical one. I believe maintainers would be open to minor changes to our layouts/formats/etc in our documents in exchange, so this is hopefully not a massive lift.

But without that switch, this really isn’t changing anything.

2 Likes

I’m in support of this as well; I very much prefer Markdown and RST being slightly different has confused me (and other people) on a regular basis (I’ve seen many instances of someone attempting to use ` instead of `` for code snippets in our release notes for instance)

2 Likes

Yep, 100%.

I created a draft PR for SphinxQuickstartTemplate, and it has some suggested guidelines for how we could handle migrations in the future.

1 Like

FWIW, this is my primary concern. I continue to be opposed to having two different formats in tree; that’s friction without good justification. So I’d rather this RFC was stronger and required a transition away from .rst files across the project and had a plan to realize that which wasn’t hoping it happens organically. We’re very bad about paying down tech debt like that, particularly around documentation (for example, we’ve never managed to go back and document all the attributes, implementation-defined behaviors, command line options, etc).

I don’t feel like I can oppose this RFC as-is because it at least clarifies what to do for new documentation, but I also don’t think this actually fixes the underlying problem which is that we have multiple documentation formats in tree and I think that’s a maintenance and review burden that will never go away unless we require the transition to happen. If that amount of work is too significant to even consider requiring, then I’d argue the default should be .rst and the .md files should be switched because that’s an achievable end-state. (I don’t care what format we end up with, but if we end up with two formats as a matter of policy, I consider that a failure.)

1 Like

My personal experience is that both the docs, and the infrastructure around them is under-loved, and I share your concerns that this is going to make the problem worse.

LLVM initial support was actually using recommonmark - a project that was replaced by myst in sphinx and it took us years to migrate (at some point it was impossible to build the documentation on some distributions if i recall).

When doing that migration we accidentally added a hard dependency on myst for all docs, which was then removed because it broke building llvm on some distributions. Will that be an issue again?

I would be much happier with the RFC if there was a migration plan.

Clang has a plugin to automatically rewrite #GHXXXX to a link to a github issue. Can we make that work with myst? I don’t think we’d want to give that up.

Others have explained it well but I will also say that I would prefer to have a concrete migration plan as opposed to “people will get to it eventually”. It feels if that is the plan we will end up supporting vestigial rst docs for a long time.