Versioning in the world of software services — Polar Squad

Challenges of versioning today

Applying well-known software versioning strategies in many of today’s software creates a few challenges.

Specifically, they are not helpful to the consumers of the software, and they are cumbersome to integrate into their development workflows.

Traditionally, software applications are run on someone else’s computer. New software releases are made available for users or administrators to download and install on their computers. In this case, a traditional versioning pattern such as semantic versioning (along with release notes) can help the software consumer to determine how the new release compares to what they are running at the moment, and make a more informed decision on when to upgrade the software. This applies to both executables and libraries.

Moreover, a traditional versioning pattern allows parallel updates to be performed for different versions of the software, which benefits those who wish to avoid risky changes. For example, there can be two parallel major versions that both still receive security patches, so that users of both old and new major versions can keep on using their preferred software versions safely.

However, a lot of the software produced today follows a service model, which is built and consumed in a very different way compared to traditional software. This includes web services and in some cases modern desktop and mobile applications. For example, Google Docs, Spotify, Netflix, and Instagram follow this model.

The service software is usually run by the same people that build it or at least people very close to those who build it (e.g. an operations team in the same organization). This means that there is a direct line of communication between the two parties, which helps in bridging compatibility issue gaps.

Instead of arbitrary user environments, the service software is often run in a handful (or even just one) of homogenous compute platforms, and most likely they run the exact same version of the software. This limits the amount of work required to support different versions of the software or figure out different upgrade paths.

The running software is constantly updated when rapid deployment practices such as CI/CD and feature flags are used. The software might even get multiple updates per day. For the developers, manually bumping the version number slows down deployment as the developer has to make up a version number before the release goes out. Practices such as conventional commit can help by making the versioning numbering automatable, but it still requires effort and discipline to support it. The use of feature flags blurs the line between versions anyway as features can be switched on and off without new releases.

The consumers who depend on the APIs of the software care about the API compatibility but not the software version itself. As long as the API is not broken or the consumer can migrate to a new version of the API, it does not matter what version of the software they are consuming. This often means that the application has to support multiple versions of the API simultaneously to provide consumers with a graceful migration path. This makes it also difficult for developers to use software versioning as a meaningful way to express compatibility.

The consumers of service software do not have any control over which version of the software they run. Updates to the software are installed whether they like it or not which means that the software consumer can no longer make any informed decisions on whether to upgrade or not. Not that it is a huge loss anyway since the consumers of service software most likely do not even care about the version they interact with: How many times have you been interested in which version of YouTube is currently running? There are, of course, exceptions to this such as the regulated fields mentioned earlier.

Qualities of versioning for services

Now that we have identified the issues in applying traditional versioning to services, let’s examine the situation from the ground up: Why do we need versioning? What if we do not have it at all? If we do not have versioning, then we lose the following qualities.

  1. Ability to distinguish instances of software from one another.

  2. Ability to trace a running application to the source code that produced it.

  3. Ability to tell which software release comes after another.

  4. Ability to tell how big of an update I am going to have when upgrading from one version to another.

There may be more qualities to consider, but these are the ones that I often think of as features of versioning.

Without the first quality, there is no convenient way to know if all the instances of my software are in sync or if there are instances that still need to be updated. This is very likely since deployments can (and eventually will) fail and leave some instances of the software still running the old version. New changes to the software may also appear during the upgrade, which further complicates the situation. Since there is no way to know which instances need to be updated, there is no way to converge to a single version among all the instances.

The second quality helps tie issues in the software to the revision of the software’s source code. Without it, it is impossible to understand where a bug can be coming from and understand what changes have been done afterward. Moreover, reproducing the bug in a testing environment becomes impossible as well, which limits the ability to show that the issue is fixed in a later version. Technically, the link to the software revision could be embedded to the software release some way (e.g. exposed in a special API endpoint or as plain text in a special file), but being able to make the connection just from the software version is more convenient because it requires less introspection on the release.

The third quality helps understand whether the next version of the software to deploy is an upgrade or a downgrade, but it is not necessary. The mindset I use for services is to always have the latest working version running. If the latest version does not work, I roll back and keep running the last version that did (downgrade), and fix the issue in the next version (upgrade). While knowing when an upgrade or a downgrade happens is useful, I do not need the versioning system to keep track of it for me.

The idea behind the fourth quality is to be able to tell whether the next version will patch something under the hood or produce breaking changes. Ideally, there would never be any changes that break compatibility with the last version, because it can cause disruptions to the consumers using the service. However, as services evolve, we often need a way to push consumers to new, incompatible APIs as well. To achieve this, we can build the service in such a way that it supports multiple versions of the API simultaneously from the same software, thus providing consumers a graceful path to migrate to the newer APIs. Once there are no users for the old APIs (confirmed using telemetry data), the API support can be dropped without disruption. The API versions can be decoupled from the software version itself, so we do not need to use the software version to track compatibility changes.

In summary, the first two qualities are important for good operations and maintenance practices, while the rest of them do not provide much value. Therefore, we can build our versioning on the first two.

Versioning to fit the service development workflow

In addition to the qualities mentioned above, it is beneficial to find a versioning solution that fits the development workflow used for building and deploying the service software. If a versioning strategy makes releasing software complicated or too hard to implement and maintain, then it also negatively affects the ability to deliver software as well.

A typical workflow for developing and delivering software for me works like this:

  1. I push software code changes to a feature branch

  2. The continuous integration (CI) system pulls the changes, and tests and checks the changes

  3. A colleague reviews the changes (if necessary)

  4. I merge the changes to the main branch

  5. The CI system creates a new release

  6. The continuous delivery (CD) system either automatically deploys the release or has a one-click button to deploy it

  7. If feature flags are used, additional features might be toggled using it

This flow is repeated multiple times a week or even a day depending on the amount of development activity, which is why there is heavy reliance on automating the flow as far as possible. Introducing additional manual steps may bring too much overhead. For this reason, the version of the software should be decided automatically as well.

Additionally, it would be nice to have the automation around versioning to be as easy to understand, use, and maintain. It preferably would not require a lot of thinking on how it works, debugging, or fixing.