Software packaging and deployment

This is an opinionated overview of the available software packaging and deployment options, including those for commercial software and binary releases.

The problem

Proper and usable software installation may include placement of executable files, data files, documentation such as man pages and info manuals, configuration files with appropriate permissions, init system configurations (e.g., init scripts, service files) for daemons, freedesktop.org configurations for desktop applications. As well as creation of system users, a way to uninstall the software, proper handling of configuration files (not just overriding user configuration). And preferably it should be manageable using standard system tools, since it is a pain to use multiple package managers to maintain a system.

There are nuances and incompatibilities among major GNU/Linux distributions (not to mention other UNIX-like systems, or different OS families), so it's not a straightforward task. For instance, it used to be a pain to write init files for different distributions because of the available software; now there is systemd on most of the major ones. Generally, automated and reliable integration of different software components is tricky.

Solutions

"Proper" flow

In the least hacky out of common scenarios, upstream developers release the software following the standards and conventions, which is packaged for different systems by maintainers, and necessary adjustments are introduced.

But this does not work for commercial software, or for binary releases of smaller projects. Neither does it guarantee that installation and maintenance would not be a pain (cf. most of the issue trackers), though it does not have to be.

Sometimes this approach works poorly with FLOSS projects as well: notably, Rust packages tend to have many dependencies, with some of those usually aiming nightly compiler builds, and it is generally assumed that it is installed circumventing a system package manager. Many non-Rust projects also consider themselves special, and suggest custom installation options: AppImage installers, other container images with whole operating systems included, manual building (sometimes with odd build tools) and installation, just some odd scripts, possibly with "curl ... | sh". This leads into the territory of the "ad hoc mess" and "masked mess" sections below.

System-independent build systems

Build systems such as GNU autotools, or the language-specific ones such as Cabal for Haskell programs, can be used for packaging and installation on their own. Autotools even try to deal with system incompatibilities, but still don't cover all the tasks (such as user creation or portable service installation), dependency resolution and automatic installation are partial at best (for language-specific package managers), and of course the software installed that way isn't manageable with a package manager native to a given system.

It is even less suitable for binary software distributions: build systems are mostly for building, as the name suggests.

As examples, autotools-generated and similar Makefile-based tarballs use make install, cabal install can install Haskell programs globally.

For Python programs something like PIPX_HOME=/opt/pipx/ PIPX_BIN_DIR=/usr/local/bin/ pipx install --system-site-packages . can be used: it is still a language-specific and system-independent system, though installing partially into non-standard paths and using virtual environments, having a notably worse system integration, more similar to those from the next section. Although cabal, while pulling dependencies from Hackage, by default similarly mostly ignores the underlying system with its package manager, and bundles those dependencies together (just in a statically linked executable file, rather than a venv).

Ad hoc mess

Custom shell scripts or Makefile, curl | sh installation, various other custom installers, manual code copying, lengthy and awkward installation instructions, DVCS-based deployment (with private keys and passwords occasionally being in a repository and/or hardcoded), and virtual machine images seem to be used rather often for in-house or "enterprise" commercial software. It is a mess and a nightmare to maintain, usually matching the software it is used for, but perhaps worth mentioning as a bad example.

For quick and dirty packaging though, tar and a few shell commands can work fine: tar czf the files, then unpack with tar --same-owner -C / -xvf, while using shell commands for that unpacking, adding users, installing dependencies, enabling services.

Masked mess

There are projects that do roughly what is described in the previous section, but with dedicated websites full of marketing slogans and making those solutions not so custom by getting more people to use the same kind of a solution. For instance, Flatpak and AppImage (primarily for desktop applications), Docker. Their issues are not very different from those with the ad hoc approaches (i.e., poor system integration), though they introduce a possibility of at least non-standard package management, and may patch some of the issues that arise.

Containerization with system images in particular I find similar to people taking screenshots of texts, web pages, PDFs, or other documents instead of copying the relevant text or saving those, and perhaps then pasting them into a graphics-capable word processor to save on a disk. Or even taking a picture of a screen with a camera. That is, capture the needed state with familiar and generic tools, even if it's inefficient and/or somewhat lossy; later those are also readable with generic tools, and it does the job in most cases.

Compared to a nice setup, such containerization introduces unnecessary abstraction layers and bloat, but compared to a messy one, it keeps the mess contained. Which makes it particularly desirable in commercial software development, perhaps.

Upstream packaging

This is the one I like the most so far: write a program as an upstream developer (following FHS and other standards and conventions, using portable libraries when available), then package it as a maintainer (following Debian New Maintainers' Guide), then deploy and configure it as an administrator. I used to cut some corners for packaging, using cabal copy --destdir=deb, but as of 2022, with Debian 11 and Cabal 3, perhaps a more proper and straightforward approach is something like the following (just for a regular binary package; can be tweaked to provide source/profiling/doc packages as well):

sudo apt install devscripts build-essential lintian haskell-devscripts
cabal-debian -m 'name <name@example.com>' --disable-profiling --disable-haddock
# comment out DEB_SETUP_BIN_NAME in debian/rules
debuild -i -b

Or, a less neat way, to build with Hackage packages, and then package the executable:

mkdir -p deb/usr/bin/
cabal install foo --prefix=/usr --install-method=copy --installdir=deb/usr/bin/ --overwrite-policy=always
dpkg-deb --build deb/ foo_1.2.3.deb

There is a "getting started" guide on packaging Haskell projects for Debian as well.

For Python packaging on Debian, there is dh-python (with some documentation in the dh_python3(1) man page): one can take an existing package as an example (e.g., apt-get source xkcdpass), compose debian/{rules,control,changelog}, setup.py (apparently no support for pyproject.toml yet), and produce a package with dpkg-buildpackage -us -uc. I think the packages I had to install before that are dh-virtualenv dh-python debhelper python3-docutils.

For other guides, see Packaging - Debian Wiki, Arch package guidelines.

An upside of such an approach is that software properly integrates into the system, so the regular practices can be applied. Sometimes it also makes you to adjust the software to make it easier to package and maintain.

A downside is that properly maintaining a custom repository (with timely key rollover) is a responsibility, and the documentation seems to mostly aim FLOSS project inclusion into the primary repositories. Hence all the third-party repositories that break updates, apparently. I use standalone packages (without repositories) on Debian instead, but it leaves open the problem of distribution and updates via a system package manager.

Tips and tricks

Here are some tips and tricks for writing software and packaging it in such a way that it would be relatively painless to deploy and maintain (apart from the regular "follow the standards and conventions").

Database management

PostgreSQL authentication

For applications that use PostgreSQL, it is handy to default to an empty (but configurable, of course) connection string: it will just work with a local database and peer authentication, simplifying the necessary deployment steps. While providing a way to specify the connection string (and not, say, just credentials) keeps it very flexible.

Database initialisation

To prepare a database (create tables, define roles and security policies, stored procedures and aggregations, views, insert initial data, etc), a handy approach is to add an "init" mode into the application, which would simply read SQL files from a data directory (which should be packaged and installed there) and execute those in a single transaction, potentially prompting a user for an initial application administrator password. It seems straightforward, yet rarely gets done any nicely. This can also be combined with schema updates by prefixing SQL file names with version numbers.