Online service provision

Here are some notes on provision of online services such as email, XMPP, shell accounts, DVCS and general file/rsync hosting, etc; the focus is on properly set software and decent public services. I don't have much of experience with public ones, but the notes are mostly on technologies rather than practices, aiming primarily GNU/Linux systems. For private ones, see the notes on private server setup and simpler server setup.

Usually service providers are obliged to assist governments with surveillance and/or censorship, and possibly to follow additional laws on user information handling. Which is not necessarily bad, but worse in some cases than in others (that is, getting servers confiscated, engaging into mass surveillance and/or censorship, and/or setting backdoors to enforce laws that don't make sense would be less desirable than just rarely helping with actual crime investigations, once warrant is provided and targeting individual users), so this should be investigated. Apparently the corresponding Russian law is such that it's better to keep services as far away from it as possible. Estonia provides "e-residency", which possibly may help to provide services under its laws.

Although perhaps even with an oppressive government it is possible (acceptable) to provide a service to those whom it does not affect (i.e., pretty much anyone outside), while following all the regulations, and presenting clearly what they are, as opposed to the common practices of not mentioning it at all, being security- and privacy-oriented but under the radar, in a grey area, and/or (partially) blocked. or boasting security and privacy while in fact following regulations opposing those. A slightly sarcastic presentation picturing benevolent supervisors providing a useful service by filtering "extremist content" and suchlike may also be quite fun. Among features, in place of strict privacy laws it could list some of the local nonsense, but from the point of view of a hypothetical happy citizen (though in order to keep it light, will have to pick something that sounds silly/peculiar, yet not particularly bigoted). Maybe also presenting uncertainty and instability as excitement. Although as of 2021 and in Russia, since many foreign mail servers are being blocked, it seems that delivery failure rate would be unacceptable for a mail service. And then 2022 with Russia's "special operation" happened, at which point having anything to do with Russia became an edgy choice in much of the world. Money transfers became inconvenient and limited, too; it is better to be in a sane jurisdiction, after all.

A relevant discussion (though probably there are plenty more around): "Ask HN: What is the best jurisdiction for internationally distributed teams?".

An user agreement should be prepared carefully, yet be readable.

Payment processors tend to be an issue as well, though some of their issues are just inherited from the bank cards (and most of the others – from trying to mitigate those with fraud detection). The options (e.g., PayPal) are bad, but they work sometimes, more or less.

Service abuse is what brings up some of the legal issues (and even when it doesn't, it's highly undesirable), but apparently it can be mitigated by requiring a small payment for confirmation, which is straightforward with regular bills, but viable with donations as well (e.g., as sdf.org does).

Though from the perspective of someone reporting network abuse, it seems pretty good if an abuse reporting email exists, is checked, and something is done about it at least after reporting. Probably those who don't care much just go ahead and run services without sorting out the abuse, and those who care too much don't even try to run such services; a good balance is needed.

SSH

SSH is one of the most widespread protocols with good authentication and software implementations, useful for both regular shell accounts and the ones restricted to provide specific functionality (email and DVCS, for instance), needed for pubnix-style systems.

System user restrictions

Better isolation and restrictions than regular file permissions are desirable in systems shared among strangers. Some of the ways to set such restrictions can be observed in the hashbang/shell-server's "security" task, and here is the list I have collected:

PAM limits and namespaces
OpenSSH's sshd(8) can (and does by default on Debian, see sshd_config(5)) use pam(8), including session management modules such as pam_limits(8) (which sets ulimit and nice, see limits.conf(5)) and pam_namespace(8) (which sets polydirs such as per-user tmp directories, see namespace.conf(5)). These are user-space and not necessarily reliable, "PAM escape" via certain programs is possible -- so those should be limited too.
File systems
Some (pseudo-)filesystem mounting options are useful for access restrictions: hidepid=2 for proc(5), newinstance for devpts (documented in mount(8)), etc. Mounting /tmp/ into memory and avoiding swap can be useful for both performance and security. Disk partition encryption with LUKS/dm-crypt would also be useful to reduce the risk of compromising user data, though that applies to computing in general.
cgroups
With systemd, systemctl(1) can be used to set those (see systemd.resource-control(5)) and limit resource usage for PAM sessions. I wonder why hashbang.sh only seems to set that for non-interactive sessions.
nftables (or iptables)
nftables can match by socked UID / GID out of the box. Among iptables-extensions(8) there's the owner extension, which allows to match outbound packets on local users and groups. This seems useful for limiting user network capabilities without limiting system services.
sshd
An SSH daemon itself should be configured to disable SSH functionality that isn't desirable, such as TCP forwarding. Though such functionality can be restricted on per-key as well.
Other hardening
There are hardening guidelines around, which tend to include both restrictions and additional security measures. For instance, hashbang/hardening, "Hardening" on Debian Wiki.

Specific functionality

For more restricted services, there may be no need in shell access, or in system users altogether, but other SSH uses may still be desired. There are SSH server libraries for that (e.g., libssh, or a Haskell ssh library; libssh2 may be better to avoid, with its rather bad track record and regularly found vulnerabilities; though years after writing this, I used libssh2 for an SFTP client, and ran into memory leaks with versions 1.7.0 and 1.9.0, then used libssh, and ran into an infinite loop in sftp_open with version 0.7.3, though apparently not in 0.9.8; so no feature-complete SSH library seems to have a particularly good track record), and many per-key restrictions can be defined in authorized_keys files or encoded into certificates with OpenSSH (see sshd(8) for the documentation), including command restrictions. It may be too restrictive for some programs (where the arguments should be dynamic), but wrappers could be used for those.

Gitea, for instance, forces execution of its own command (via command in ~git/.ssh/authorized_keys for each added user), and disallows everything but command execution (as used by git), manually ensuring that commands are git ones, and checking repository access privileges using its own rules. While rsync provides the rrsync script, also to be set via command, only allowing rsync to be used, and restricting it to a certain directory. rssh similarly restricts commands available over SSH, mostly to file transfer ones.

VPN

VPN (IPsec, WireGuard, etc) usually provides both encryption and authentication, convenient for running simple protocols (unencrypted, maybe with host-based authentication) on top of it. Additionally, it may be convenient for connections between users.

Common authentication methods

PAM authentication may be nice to reuse for everything (possibly via SASL), especially if shell access is provided, but unfortunately it mostly aims plaintext authentication.

SASL is nice for uniform authentication across services. Usually it is not tied to system users, and can be used with LDAP (and so can PAM). See the "user authentication" note for more on the topic.

User directory

To detach users from the underlying operating system (that is, to avoid using system users), possibly using a shared user directory across multiple servers, LDAP is a common option.

Backups

Applicability of different methods depends on the kinds of data stored. Some of the common ones are rsync, database replication and other built-in/specialized backup/synchronization methods, mirroring with RAID (1 in particular), DRBD (see the section on HA).

Scaling

A decent service shouldn't trap users, so horizontal scaling should be as easy as setting identical systems, relying on federated protocols for interoperation. Configuration management systems such as Ansible are useful for that. Though high availability (see the next section) usually involves redundancy, which can easily provide scaling in some cases as well.

High availability

There are nice tools for highly-available (HA) clusters around: pgpool-II (for PostgreSQL), DRBD + GFS2/OCFS2 (for a distributed filesystem), Pacemaker (for general resource management/failover, including services and automated setting of load balancing via IP multicast). All those are available from Debian repositories, and seem to be maintained, used fairly widely.

Capability management and DoS protection

It is rather hard to be certain that a complex system would function properly under unexpected loads. Stress testing should be performed, and other iptables extensions could be useful here, such as hashlimit to set per-IP limits.

Monitoring (with munin, Zabbix, or something along those lines) should be helpful for capacity planning.