Doc like an Egyptian: Manage project documentation with Sphinx

At the 14th Annual Southern California Linux Expo (aka SCaLE 14x), Dru Lavigne will discuss common “pitfalls” associated with creating and maintaining documentation, and she’ll talk about the open source tools available. It will also give an overview of Sphinx, an open source documentation generation system originally created for new Python documentation.

In this interview, she explains how Sphinx is different from other open source solutions and what types of projects should consider migrating their documents.

Why have the PC-BSD, FreeNAS, and Lumina documentation projects moved to Sphinx?

When I became the PC-BSD documentation maintainer in 2010, I inherited an existing documentation wiki that contained a lot of user-generated content, most of which had been several years out of date. Soon after, I also became responsible for creating the new FreeNAS documentation, so it made sense to create a documentation wiki for this project as well.

Over time, the shortcomings of the wiki approach to maintaining updated and versioned documentation became apparent:

  • While the main purpose of a wiki is to invite users to contribute and provide a low barrier to entry, very few people come to write documentation (however, every spambot on the planet will quickly find your wiki, which creates its own set of maintenance issues).
  • Wikis are designed for separate page infobytes, such as tutorials. They really aren’t designed to provide table of contents navigation or to provide chapter flow, although you can hack your pages to provide navigation elements to match the flow of the document. This becomes more difficult as the size of the document increases – our guides tend to be over 300 pages. It becomes a nightmare when trying to provide versioned copies of each of these pages so that the user finds and reads the correct page for their version of the software.
  • Although wiki translation extensions are available, their configuration is not well documented, their use is slow and clumsy, and translated pages only increase the number of available pages, bringing you back to the problems of the previous point. This is a big problem for projects that have a global audience.
  • Although output-generating wiki extensions are available (for example, to convert your wiki pages to HTML or PDF), how to configure them is not well documented, and they offer very little control over the layout of the output. generated format. This is a big problem for projects that need to make their documentation available in multiple formats.

We spent a few years hammering various odds and ends into our existing wiki infrastructure to convince it to create what we needed: large documents, versioned and translated into different formats. We also spent a lot of time researching alternatives. During our research, we had these objectives in mind:

  • must support a table of contents structure and be able to produce multiple formats, preferably through integration with a source build framework;
  • must integrate seamlessly into a translation infrastructure;
  • should provide a low barrier to entry for document writers and translators.

In our research, we found that the barrier to entry tended to be inversely proportional to the quality and number of output formats available.

Sphinx provided a good middle ground in that its syntax is almost as easy to learn as a wiki syntax, it supports integration into existing source repositories, as well as build and translation frameworks , and it offers decent control over output layout, although this varies by format.

What are some of the big lessons of migrating to Sphinx?

As an experiment, I first migrated the existing FreeNAS documentation. Since at this point we were maintaining both the wiki and an OpenOffice master document (to generate HTML and PDF), I found a script that converts .odt to .rst (the format used by Sphinx). Having never used either .rst or Python before, I spent time learning how to create an HTML version and experimenting with various themes and extensions. I then spent about a month cleaning up the migrated .rst files, learning as I went how the different tags worked, and the best way to lay out our documentation tree. As with any migration script, not everything migrated cleanly, which gave me the opportunity to understand how tables are formatted and which tags controlled which layout.

After this first migration, I had a good understanding of the tags used by our documentation, useful extensions, and the theme we liked. I then used this knowledge to migrate the PC-BSD documentation. This time I used a different migration script, which did its markup a bit differently. This gave me the opportunity to discover tags that I hadn’t seen before and decide which ones I liked the most in order to unify the two documentation projects. The second migration took less than a week. When we needed the Lumina documentation project, I created it directly using Sphinx and it took less than an hour to set up the documentation tree, build infrastructure, themes and extensions so I can start writing docs from scratch. .

After going through this process, I would recommend the following:

  1. If you plan to migrate an existing documentation set, find a migration script for your current format and give yourself time to play around with tags, themes, and extensions.
  2. Write a README file with instructions for documentation writers and users who want to create their own formats from your documentation source. This should include any software that needs to be installed and a list of tags used by your documentation project – you’ll know what they are at the end of your migration.

How is Sphinx different from other open source solutions?

Although Sphinx is easy to learn, it has its quirks. For example, it does not support stacked tags. This means, for example, that you can’t italicize a sentence using markup – to achieve this, you need a CSS workaround. And, while Sphinx has extensive documentation, much of it assumes you already know what you’re doing. When you don’t, it can be hard to find an example that does what you’re trying to accomplish.

Sphinx is well suited for projects with an existing repository, for example on github, a build infrastructure, and contributors who are comfortable using text editors and engaging with the repository (or building, for example, git pull requests).

For projects that want to control the look of their documentation beyond built-in or available themes, access to a CSS guru is helpful.

It’s probably overkill for projects with a small set of documentation that doesn’t need version control, translations, or multiple published formats.

Which project stands out as having outstanding documentation? And which ones would benefit from a documentation overhaul?

Having been responsible for documentation for many years, I hesitate to cite good and bad examples of documentation. Documentation, for any project, is difficult and time-consuming. Software is a moving target and software users vary in their skills and therefore have very different documentation needs. In this regard, no documentation is ever complete – or truly up-to-date – and that’s just the nature of the documentation set. The best we can do is try to make it easy and compelling for contributors to keep the documentation current, useful, and in the languages ​​and formats required by the software’s user base.

I have a vision: 2016 will be the Year of Open Source Haiku. What is your documentation advice, given via haiku?

All new shiny
docs, shimmering like tops

Sam D. Gomez