Lightweight web browsers

While huge web browsers such as Firefox tend to handle web applications, they also tend to consume a lot of resources even for a modern computer (and just be unusable on older hardware), to have the same bugs for decades, and to regularly get new ones. Using simple web browsers for simple HTML documents is a solution, and here are my notes on such browsers, as well as on technologies used to build those.

Early browsers

HTML was pretty simple initially (and perhaps by itself, without JS and CSS, it's still manageable), and so were web browsers; HTML history covers some of it. Though the efforts to bloat it started quite early too: ViolaWWW supported scripting and styling in early 90s, ECMAscript started creeping into web browsers at 1995, along with Java.

Textual browsers

These can be relatively simple, since they offload much of work (i.e., text rendering) to terminal emulators and/or Emacs. But there often are drawbacks too, such as poor support or no support of embedded images (inherent in mostly-textual environments) and HTML forms (particularly for XML-based documents, such as XHTML and HTML5; often due to poor ad hoc parsing). While in a world of non-bloated technologies it would be fine for most of the browsing, illustrations and structured input are still useful sometimes. Perhaps the best support of those I've seen is in eww; other common textual browsers are lynx and w3m, and I wrote pancake, a pandoc-based one.

Graphical browsers

links is an interesting project, and there is links project documentation. Many bits in it seem to be quite ad hoc, including parsing (which also fails on XHTML/HTML5 forms, and has special cases for a few websites), and some optimisations look like an overkill, sacrificing convenience and flexibility (e.g., using hardcoded glyphs), but it appears to work faster over Internet (with 10-20 ms network latency) than major browsers would even with local files. It can use one of a few drivers (terminal, X, SVGAlib, DirectFB, SDL, etc) to work in different environments.

NetSurf

NetSurf is another interesting project. Much of its functionality is split into reusable libraries. As links, it uses a framebuffer abstraction (LibNSFB), a faulty hand-written parser (Hubbub), has Unicode-related warts. Unlike links, it uses native GUI for menus and such when built for GTK, and libcurl. It has a bunch of additional issues (partially broken TLS, poor CSS support and no way in sight to disable/override it, poor word wrapping, very slow on large documents, etc), but still fun to poke and to read its sources, which are not in a particularly bad shape.

Other graphical browsers

There's not many others that I know of: there was dillo, but its domain name had expired, and its sourceforge mirror is broken. Then there are ones that reuse huge bloated engines, and just HTML renderers such as tkHTML (though that one is discontinued, and was used for the Html Viewer 3 web browser). KHTML and GtkHTML seem to be quite large and not maintained (not counting WebKit/Blink).

Technologies

While HTTP can be handled by libcurl or libsoup, and proper streaming parsing of both XML-based and SGML-based documents can be achieved with libxml2 (or some other parser), the most challenging part seems to be GUI.

GUI toolkits such as GTK+ 3 can handle text rendering (with Pango in case of GTK), inputs, and image rendering, but not quite in a way needed for a web browser: one can, for instance, just put labels and other standard widgets on a window, but not with text still being selectable as normally expected. The expected inline rendering of HTML "phrasing content" elements is also tricky with standard widgets.

Then there's xlib, with little and dated support for text rendering (and less portable than GTK), and if one is going to render texts separately, it gets pretty close to using a framebuffer abstraction (on top of X, SDL, Linux framebuffer, etc), as graphical web browsers tend to do.

The libraries relevant to text rendering are HarfBuzz (complex text rendering), Fontconfig (font selection), FreeType (glyph rendering), Pango (mostly bookkeeping and ease of use on top of those). Unicode itself can be quite a pain to deal with without such libraries. "Text Rendering Hates You" is a nice summary of the challenges.

To explore it better (and possibly to get a nice lightweight browser), I started writing WWWLite. At 2 KLOC, with the help of GTK+, libsoup, libxml, and Pango, its prototype can handle texts with selection, clickable links with focusing, inline elements such as images, and a few other things, with incremental parsing and UI building. At about 3 KLOC tables (with rowspans and colspans) and forms are handled too, a few things are refined, tabs and history navigation are supported. Apparently it is possible to get a usable graphical web browser in a few KLOC. I'm planning to document the implementation and related issues better in its documentation.

Complexity sources

Much of complexity (everything around text rendering and processing) comes from writing systems, hence on a large scale it's accidental complexity. HTML (and typographic) elements themselves seem to augment the language similarly to punctuation, and perhaps wouldn't be as useful with a constructed language akin to Lojban (though parsing, highlighting, and special rendering for certain constructs would still be useful for humans to skim/scan documents). Likely even such a language with a simplified writing system wouldn't be an optimal way to convey information from a two-dimensional surface to a human, and it's far from viable. See also: formal human languages.