Software project complexity
Same problems can be solved very differently, and apparently the
difference in spent time and effort can sometimes be measured
even in orders of magnitude, especially if maintenance of—and
interaction with—such solutions is taken into
account. Programmers usually agree that simplicity is important,
but disagree on simplicity definitions.
There are various software complexity metrics, though they cover
little and potentially wrong. There are plenty of articles which
either focus on a particular view of complexity, or just praise
general simplicity without getting into details. Here I'd rather
try to outline different aspects of software project complexity,
and different views on it. Many of these seem to be similar,
related, and/or interdependent.
- "Simple" vs. "easy"
-
Confusion of those is both widely known and widespread.
- Hardware abstraction
-
Having no complicated compilers or interpreters between source code and
executed machine code can be seen as a kind of simplicity, while having
to deal with low-level details is often seen as unnecessary complexity.
- Mathematical simplicity
-
A simple model of computation, possibly along with elegant type system,
abstractions, and/or nice applied models can be seen as simple. It can
also be seen as an arbitrary and unnecessary obstacle, adding strange
cruft to the codebase, and/or not being straightforward.
- Correctness
-
Complexity of maintenance (and debugging time in particular) depends on
correctness, which is often expected to follow from software simplicity;
the "simplicity" definition usually varies. Views on other methods of
achieving correctness (memory- and type-safety, precise specifications
and verification, testing, code reviews, etc) also vary.
- Data models
-
Complexity of data models is a topic I covered in a separate note. Some
see it as an important contributor to project/software complexity, some
ignore it completely.
- Mutable state
-
As within a single program, a software system may have
unnecessary mutable state (e.g., cache, preaggregation, a
processing pipeline), as a consequence of premature
optimization or simply bad design, which is harder to debug,
and lets more errors to creep in.
- Code quality
-
General code readability, nice and conventional formatting and naming,
comments and annotations, documentation, absence of
anti-patterns,
-Wall
(or similar), linting, and various
other language-specific niceties can be seen as a kind of simplicity (as
in "absence of a mess"). Though not universally: some rebel against
conventions, claim that their code is self-documenting, and so on.
- Code complexity
-
Number of lines of code, cyclomatic complexity and friends, lack of
abstractions or presence of unnecessary ones, etc. It's a large one, and
partially covered by the other listed aspects; it gets a separate entry
mostly to highlight that it's just a part of overall project complexity.
- Architecture
-
Also partially covered by other aspects, but perhaps worth listing
separately: a poorly designed system can introduce a lot of accidental
complexity. Though views on what a good architecture looks like vary.
- Build system
-
An unconventional (and uncommon) build system can complicate both
development by others than the original author, and building by users,
often introducing additional dependencies. Though I guess they usually
get chosen in the first place since they seem to simplify the task (make
it easier, and/or make the configuration simpler).
- Dependencies
-
While they help to reduce effort duplication and code complexity, it can
be quite a burden to pull and build dozens or hundreds of those even for
a small project: even though the process should be automated, an issue
in any of direct or indirect dependencies could break the build, build
time increases, dependency tree inspection gets harder. Additionally,
vulnerabilities in some of the components are more likely, and minor
issue fixes (or changes in general) may be virtually impossible to
introduce even into FLOSS projects without forking them.
- Standard facilities
-
Logging, scheduling, access control, authentication, packaging and
dependency management, dynamic linking (and library updates), IPC, and
other standard tools and facilities are usually available, use of which
makes a whole system easier to manage, and can simplify software
considerably. But more portable, independent, or self-contained software
may be preferred sometimes, and over-reliance on such facilities can
indeed complicate a project (e.g., turning it into a pile of shell
scripts).
- Other standards and conventions
-
Virtually anything custom complicates interaction, composability, and/or
usage/maintenance, and needs to be implemented (hence complicates the
code). An argument for custom solutions is that they can fit the task
better, while avoiding complexity of generic solutions (and possibly
dependencies). Some just leave things non-compliant/broken as long as
they work in specific cases where they need them.
- Component availability
-
It simplifies both development and usage/maintenance if the
compiler/interpreter and all the dependencies are commonly installed on
target systems, or can be installed from standard repositories with low
overhead. I don't think that such availability is ever seen as a
complication, but can be seen as unimportant or non-critical.
- Component complexity and quality
-
Sophisticated tools (compilers, libraries, etc) are prone to subtle and
unexpected errors (both bugs and usage errors), which are hard to debug
or fix. The uncertainty it adds can be seen as complexity, and some
avoid such tools.
- Development process
-
That's another large and controversial aspect: issue tracking and
project management software, various methodologies, communication
software/protocols, CI/CD systems and other technologies are supposed to
simplify the process, but may seem/be annoying, inefficient,
time-consuming, and/or unnecessary as well.
- Requirements
-
Plenty of complexity can come from potentially changing
requirements that don't fit into clean and simple models. Or
from underspecified corners of requirements. But some of those
can be discussed and reconsidered, especially if those are
solutions rather than problems.
- Licensing
-
Software licenses are usually just a minor annoyance, though it gets
worse when working on proprietary projects (particularly if there also
are attempts to enforce those by only distributing binaries, or
providing otherwise unnecessary SaaS), and worse yet with proprietary
tools/dependencies. While uniform and/or conventional licensing of all
the components (including language-wide traditions) helps to not worry
about that. I don't think it's controversial as a contributor to project
complexity, though can be neglected.
- Suitable tools
-
The law of the instrument is widely known, and it's often said that
tools suitable for a task should be used. Suitable tools can greatly
simplify solutions, but it's hard to identify those, and merely being
aware of this bias doesn't prevent one from counting their favourite
tools (or—even worse—whatever is used at some large company, was praised
in some recent blog post, or just commonly used) as suitable for
everything.
- Serviceability in general
-
It may be alluring to stick to technologies one understands
well, to be ready to maintain or even implement from scratch a
compiler and all the dependencies, along with the operating
system, and possibly some of the hardware, with an additional
benefit of it being easier to debug issues at any level. But
overdoing that is likely to be limiting: even without touching
the hardware, maintaining all of the used software alone--in
addition to doing actual work on top of it--is extremely
impractical, and usually infeasible. On the other hand, giving
up on serviceability completely and using proprietary
software, not having an idea what is going on at lower levels
or any control over the system, is not a great approach,
either. Decoupled and documented components may help to reduce
dependencies on particular technologies, and sticking to FLOSS
usually helps to have some level of control: e.g., at least
critical bugs can be fixed, until the relatively quick
migration to alternatives is done. Similarly to building a
computer with easily replaceable parts, as opposed to using
one of those locked down ones, or the ones with soldered
components, proprietary form factors, etc; or to starting with
mining copper ore, using primitive tools only (or at least
keeping the system simple enough for quickly porting to such a
computer): a balanced approach is both practical and easy to
achieve. Yet there are people preferring the extremes, too:
plenty of use of proprietary technologies, while the other
side usually does not start with copper, but "not invented
here" and similar tendencies are fairly common.
Even though views on those vary, it seems useful at least to pay
attention to them, to be aware of those. Many complexity types
seem to stay unknown or ignored until one runs into a mess made
with those.
Yet another complexity classification is just into 3 groups:
hardware, maths, and accidental. I think it's not hard to fit
the aspects listed above into these 3 groups, but they may not
be very useful as a checklist (though still more useful than
only focusing on one or two of those).