Markup languages

There are plenty of markup languages around, and sometimes it is tricky to pick one for a task at hand. I am going to put together a few observations here.

And since I am interested in exporting documents into HTML, Atom feeds, and info files, possibly printing, and reading as plain text, those aspects will be mentioned explicitly. It is assumed that quality of export/conversion unintended in the design is rather low.

Languages

(La)TeX

It is an advanced markup language, or more of a typesetting system, and it is nice in many aspects, so I will only list its drawbacks.

Cons:

Use cases: it is useful for complex documents, involving mathematical formulæ (and possibly diagrams), or for anything that could use templates, but could be excessive in other cases. "LaTeX is the de facto standard for the communication and publication of scientific documents."

HTML, other SGML- and XML-based ones

XHTML can be nicely generated out of XML with XSLT (which is handy for templating), as can Atom feeds, though XHTML is retired, as is the XML syntax for HTML LS: it is back to rather vaguely specified and changing markup. Then there is MathML for mathematical formulae, but it is too verbose for manual composition or reading. Use of data models such as DITA and DocBook brings additional pros and cons, so they are summarized separately.

Pros:

Cons:

DocBook, DITA, etc

Pros:

Cons:

Org-mode

Pros:

Cons:

Use cases: all kinds of notes, static websites, probably basic info files.

Texinfo

Pros:

Cons:

Use cases: its primary purpose is to create technical manuals, and it seems to be good at that, so anything manual-alike is what it is good for.

CommonMark

A specified Markdown flavor.

Use cases: HTML export is its primary target.

reStructuredText and Sphinx

Probably I should not mix those together, but that is what I am doing.

Pros:

Cons:

Use cases: manuals, documentation, READMEs (quite comfortable to read as plain text).

Others

roff-based languages (nroff and troff, as implemented in groff) are used for man pages, and are quite usable, though neither handy nor feature-rich. Though as with TeX, no specification (not counting tutorials and user references) in sight.

PostScript is a surprisingly readable and somewhat nice for a language that is usually used as a target to compile other languages (perhaps LaTeX most of the time) into. Has a decent freely available specification: PostScript Language Reference Manual.

For lower-level typesetting, one may also be interested in the unofficial DVI specification.

SILE looks like a nice typesetting system, supporting both XML and TeX-style syntaxes, apparently making use of MathML for mathematical formulae (but compiling TeX-style formulae into it), allowing scripting in Lua, relatively well-documented (The SILE Book).

typst is also a typesetting system, but less well-documented, its Rust package seemed broken at the time when I wanted to try it, and it seems to be commercial.

For use of macros with most markup languages, m4 can be handy.

Essentials

Most elements (headings, lists, references, etc) can be handled by plain text, but there are at least a couple of commonly needed things other than regular text that need an explicit support: embedded images (illustrations) and mathematical formulae.

Mathematical formulae are usually delegated to either LaTeX or MathML (possibly embedded as images, but those still have to be created), though groff's (troff's) eqn(1) can produce those as well. There is Content MathML, which focuses on semantics, and which is compared to Lisp (except for MathML being extremely verbose). When writing down notes with formulae and solving textbook problems, I tend to mix plain text with chunks of Emacs Lisp code in order to easily execute them in Emacs at once, but they are relatively legible, compared to MathML or even LaTeX. SymPy formulae and functions are not too bad for legibility, either. Other times I employ Unicode characters for the sake of compactness, though usually programming languages support those (e.g., Agda even uses them extensively). A Lisp language may also provide simplicity and uniformity, at the cost of being a little more verbose than those infix and mixfix operators, with varying precedences.

Apart from formulae and images, URL references are commonly needed and awkward to handle, but they can still be embedded into plain text without harming legibility much. Likewise with tables, though in many cases they can be simply avoided.

Conclusion

As usual, it is about preferences, priorities, and tasks. It may be tempting to pick a single language for everything, but as with programming languages, there is no single tool ("golden hammer") that would be most suitable for all tasks.

See also

Wikipedia: Comparison of document markup languages, List of document markup languages.