Everyday programming in Haskell

This is a collection of personal notes on day-to-day programming in Haskell. It is one of those cliche "X in production" articles, which I would normally avoid, but since it is the primary language I use at work since about 2015, and since Haskell-related articles, blogs, and communities tend to focus on entry-level materials and on related theories, it seems that writing these more practical notes down may be useful.

Why Haskell

In my case, reliability was a concern: I was rather tired of hunting both actual bugs in programs written in C, Perl, Python, and PHP, and the bugs that were reported as noticed somewhere in the whole system (with many unreliable external components). Given sufficient time and fixed requirements, it is viable to write reliable software in virtually any common language (assuming that the compiler and libraries are not too buggy), but usually there is not enough time, and the requirements keep changing. A nice architecture helps to not break the system when features are added, and to keep it simple enough to maintain and quickly refactor without breaking, but a nice type system and simple semantics are also useful for that.

There are dependently-typed languages suitable for verification, which I poked as a hobby for a couple of years before switching to Haskell, but unfortunately they are not nearly as mature. Then there are languages with more arbitrary typing and semantics (mostly imperative ones), which may not help that much. And languages which are even less mainstream than Haskell, with fewer libraries that are readily available (but needed when unexpected new features should be added quickly). Speaking of libraries, often I have to implement uncommon network protocols and other things involving parsing, and Haskell parsing libraries are among the best parsing tools I am aware of. So it seems like a sensible compromise, being pretty good in every category I care about for these programs.

Maintainability

Haskell code is relatively easy to refactor and maintain in general, and hard to break by accident. But it is also hard to edit without familiarity with Haskell, and since Haskell is relatively uncommon, it may be challenging to find a Haskell programmer; that is a major obstacle for adoption (which in turn keeps it relatively uncommon).

In part to mitigate that, and in part to get a decent system regardless of a language, I find it useful to follow Unix philosophy by making individual well-defined components close to what one could reasonably expect to find in standard system repositories if it was already implemented (and more commonly needed): separate programs that do their job without any particular system in mind, using standard IPC mechanisms (standard text streams, Unix sockets). That way, in the worst case it would still be viable to rewrite individual components in another language, without touching the rest, as well as to interact with those components from other languages. That is opposed to a common in-house or enterprise software practice, where the custom programs are special snowflakes that do not follow standards and conventions, and possibly just coupled into a single monolith.

General good practices also apply: comprehensive documentation, clean and simple code, and minimal dependencies would make it less of a headache to maintain both for oneself and for the potential future maintainers.

Code simplicity

I think it is not controversial to say that Haskell-specific "code simplicity" means that there are no complicated type tricks, not much of abstract algebra, no Template Haskell, no GHC Generics, no DSLs, -Wall is used, and maybe just a few common language extensions. It is not just that a novice programmer may not be familiar with them, but also that after not touching those for a while, it may be challenging to debug or edit non-trivial uses of those on your own. As the Brian Kernighan's quote goes, "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"

Safe Haskell also checks for some of that (and is nice in general), but unfortunately not all the common libraries are "safe" in that sense.

Minimal dependencies

Haskell makes it easy to abstract things, as well as to grab and compose different pieces of code. Which is nice, but leads to huge dependency hierarchies; dependencies come with bugs and bloat, and should be maintained. By default Cabal and GHC would also link Haskell libraries statically, so that could easily lead to a huge codebase that does not get updated with the rest of the system.

Sometimes I use in-place FFI, in both hobby and work projects: a polished C library is often more complete and reliable than a Haskell reimplementation (or even bindings), has fewer dependencies, and if you only need a few functions, it is not much of additional code. Given all the text conversions, imports, and kinds of errors one has to deal with while using Haskell libraries, it may even require less code to use a C library.

Preferring lower-level Haskell libraries may be similarly useful: higher-level ones tend to introduce bugs, restrict what one can do, and introduce more dependencies (well, same as in any other language). Another obvious trick is to implement small functions even if they are available in libraries, also a more common practice in C. It is good to reuse code, but perhaps not to the extreme where a program is just a lot of libraries stitched together.

Tools and infrastructure

By default Cabal would pull dependencies from Hackage and link them statically. Possibly using a sandbox, while Stack would also pull GHC and use Stackage. I find it awkward, but perhaps useful in fighting the dependency hell while using cutting edge software.

But GHC supports shared Haskell libraries, a program can be built with cabal install --enable-executable-dynamic to use those, and Debian repositories include many Haskell libraries (as well as regular ghc and cabal-install). So one can use a system package manager and repositories to both install and update everything. Or a combination of that and Hackage.

I am packaging software into Debian package archives with dpkg-deb(1), and listing dependencies from system repositories in the control file. Perhaps Cabal is unnecessary in such a setting, but still handy as a backup and for building on different systems.

Emacs haskell-mode is nice and sufficient for active programming with REPL, though there are other packages with additional features. Haddock (a documentation tool) is not bad, but unfortunately the generated documentation is not very readable in lightweight browsers, without CSS (and possibly JS). Profiling and debugging are not as nice as with C, but usable. Testing libraries are handy (though I seldom use them). State of the Haskell ecosystem is a nice summary of tools and libraries.

String types

There are CString, String, Data.ByteString (lazy or strict, Char8 or Word8), Data.Text, and awkward but common conversions between them, because different libraries use different types. That is yet another reason to avoid dependencies.

Data.ByteString (strict, Word8) is the closest out of commonly used ones to CString, which is used by C and Unix (i.e., the outside world), so I think it makes sense to view it as the default; String is in the base library and can be used for Unicode text manipulations; Data.Text is there for efficient Unicode string storage and manipulation.

Error handling

The situation is outlined in the "control flow" note: there are multiple ways to handle errors, and different libraries use different ones. The "outside world" usually uses return codes, but in Haskell unchecked built-in exceptions always win and can happen unexpectedly. And there are asynchronous exceptions, coming from outside (e.g., other threads). So one has to handle them anyway, and could as well throw them too. I find it unnecessarily messy, but there it is.

GHC RTS, concurrency, FFI, and POSIX

GHC alone has single-threaded and multi-threaded runtime systems, "safe" and "unsafe" foreign calls (as well as other modifiers one may need to have in mind, such as "interruptible"), bound and unbound threads. It is potentially nice, but quite a bit more complicated (and in some cases has more overhead) than just calling functions from C, even concurrently.

The most straightforward (that is, resembling system threads, not requiring to interact with GHC's event manager explicitly) combination is perhaps multi-threaded RTS with "safe" calls (particularly for blocking functions).

The CApiFFI extension, without usage of tools like hsc2hs or c2hs, seems to be a relatively straighforward and modern (non-deprecated) way to define foreign functions and values.

As of 2019, there are no complete and consistent bindings to POSIX functions, but there are attempts to make such bindings. Though apparently the prevailing view is that it is better to use threads, throwing exceptions between threads (and hoping they will arrive), using channels and MVars between those, and not just to poll file descriptors. I think the latter is simpler (and less error-prone) in many cases.

Memory leaks

Unlike in lower-level languages such as C, where a programmer introduces memory leaks, Haskell (sometimes with the help of GHC or libraries) does that automatically and implicitly, and in much more inventive ways than simply a missing free(3) call, usually employing some combination of laziness, garbage collection peculiarities, foreign pointer finalizers, maybe otherwise faulty library bindings. All the added moving parts also make debugging more interesting.

Just as in C, though leaks do happen, they are mostly avoidable or fixable, with enough time and effort. And as with other higher-level languages, those higher-level features come with their downsides.

Epilogue

There are warts, awkwardness, and imperfections, but it applies to virtually any non-trivial technology. The language semantics are nice, relative to other somewhat common languages; GHC is a good compiler; fine system (POSIX and GNU/Linux distributions) integration is achievable.

I guess Rust may be another fine option for projects with similar requirements these days, though there are its own warts as well, and the libraries are not always polished. And other C alternatives may compete with it.

As a side note, I am increasingly suspicious that language properties do not matter that much for correctness, similarly to how they rarely matter for performance. That is, as in most contexts it is best to focus on algorithmic optimization for performance improvements, or at least on avoiding slow I/O when unnecessary, before rewriting everything in an assembly language, picking the fastest instructions, trying to make use of CPU cache and speculative execution, for correctness it also seems more important to have polished and maintained libraries, rather than to have a language where you have to hack things together as you go, with abandoned, experimental, or otherwise poorly working libraries. As an example, I noticed that PHP's FTP implementation appears to be more complete and reliable than those available for Haskell and Rust.