Sam's Versioning: Samver

Stop Abusing Semver

I like semver. I work as a systems folk, and I end up neck deep in other peoples' dependencies more often than I would like. For libraries, semver is pretty good.

That said, I've run into a cult of programmers who see semver as the best thing next to Jesus coming back for solving all their problems. And they go to alllll the meetings. Some things I've seen:

Making an API for an external bank? Put semver in the URL so they can pick their version!
Oh you're on version 1.124.0 huh? That many commits and not a single breaking change. Right.
0.2.191. In a Jenkins job.
A Helm chart wraps a Docker container which wraps a binary, and two sidecars. They all have different semvers. Question for the audience: What broke?
Try out Go modules! How many 0.0.0-x-y modules can you find?
Chef Cookbooks. There is no one better than old school SysAdmins for defining programatic interfaces. What better way to discover a patch bump was a lie than 2000 servers telling you in slack?
2021.06.24: the mock wheel is at 2.0.0, but the types-mock wheel went from 0.1.1 to 0.1.3. Does mypy pass?
2021.08.03: terraform kubernetes provider is pinned at `~> 2.3`, an automatic feature upgrade to 2.4 reports incompatibility with terraform 0.12, breaking pipelines.
2023.10.04: Kubernetes upgrades from 1.15 to 1.16, tens of APIs have subtle changes. Do addons keep working? https://kops.sigs.k8s.io/contributing/addons/#semver-is-not-enough-id

The key idea around semver is around choice. If your users have the ability to actively choose what version of your software they're using, it works. But large amounts of software is foundational, in that it needs to keep working for a long time to allow for progress to be made above it. This is common in infrastructure, where we are so low in the stack the ripple effects of changes end up costing an enormous amount.

If your users don't have a choice as to how they consume your software (its behind an API, its a hidden script that produces a report, etc), dont break your API. Make a new thing, name it something else, figure out the migration that makes sense for you and your users. And if you learn to communicate major releases within the context of your users, theres probably a good chance you could use that knowledge to better communicate features and bugfixes than trying to encode it into a single number that's an incidental string to your package manager.

Do your users even care what version of your software they're using? Or do they just want it to work?

Another small note, when dealing with automated CI/CD pipelines, the developer must always specify the semver to the automation. Semver throws a complexity wrench in what would otherwise be straight-forward automated releases. Determining what changed at an interface level in the best case requires scraping contrived commits or doc files in the repository, which must be reliably updated by every developer (lol). There is a way to programatically compare an interface and bump a semver string automatically, but down that path lies Swagger diffing and RAML and "Codeless" startups, and ahhh good luck with that.

An Alternative

Are you releasing software to other developers who are making concious choices on when and how to depend on your software, as part of a dependency tool (like ruby gems) across an organizational boundary? Then maybe use semver. Update your CHANGELOG and please set some releases in Github so we have a chance to understand which ways up.

OTHERWISE

You should probably just automatically generate (DERIVE!) the version based on source code for the benefit of everyone involved. So what would the best way to do that look like?

When I'm digging around in whatever new wibblywobbly landed on my desk that day, I'm going to need a vague idea of what generation of code I'm working on, and most importantly a way to trace that specific species of wibblywobbly through the various artifact stores (they keep making more of them?) and eventually back to whichever repository it crawled out of.

Without outside information, when trying to communicate a feature or a bugfix, whats easier, "/foo was added in version 27" or "/foo was added in 2008.02.28" ?

Right, so lets use a date.

Whose date? The date the customer sees it or the date the developer wrote it? Hopefully you have a continuous deployment pipeline in place, in which the customer gets an evergreen version without thinking about it, in which case our versioning should be developer centric.

So lets use the commit date of the code. (All artifacts are generated from source control right?)
But which branch? Oh no. So lets add the commit SHA. But not too long there.
Oh and I want it sortable.

So lets use:

YEARMONTHDAYHOURMINUTE-GITSHA(short), ex: 201909031732-aa314da

Or in its semver compatible format:

YEARMONTHDAY.HOURMINUTE.SECOND+GITSHA(short), ex: 20190903.1732.47+aa314da

Sorting Samver

Use standard unix `sort` in any flavor installed on any machine. Be aware there is no way to correctly sort commits made in the same second.

Meanwhile if you are stuck in semver land, you can use https://github.com/samrees/semversort.

Sample Implementation


#!/usr/bin/env bash
# -- samver
# -- in a git repo, spits out a goodly version

if date --version &>/dev/null; then
  # GNU
  echo $(date --utc --date="$(git show -s --format=%ci)" +%Y%m%d%H%M)-$(git rev-parse --short HEAD)
else
  # BSD
  echo $(date -u -j -f %s $(git show -s --format=%ct) +%Y%m%d%H%M)-$(git rev-parse --short HEAD)
fi

F.A.Q.

What do I do if I commonly have multiple commits in the same minute?

Add a %S to your date format to get second resolution.

What do I do if I commonly have multiple commits in the same second?

You are either abusing git or are working on such a large monorepo that you should not be taking advice from a random page on the internet.

What do I do if I have to use something insistent on semver?

If you have something awful like the helm packager for kubernetes, that FORCES you to use semver, that assumes that every service you build you should indicate what changed with a single number, even if the helm chart you're deploying is a custom built artifact of tens of services bound together as a monolith, with hundreds of API endpoints... well I pity us both. You'll see errors like this:

Error: Error parsing version segment: strconv.ParseInt: parsing "20191025125407": value out of range

To avoid that, you can fool it by trying a date format like this: +%Y%m%d.%H%M.%S, which prints something about like this: 20191025.1855.47. Or %N instead of %S on linux (not bsd bfytw) for nanoseconds. You also need to adjust it to be + instead of a - to be "build metadata" (ex 20191025.1855.47+a5a7ec2), otherwise semver treats it as a "pre-release". AND pipe through the minor and patch versions through something like $((10#${MINOR})) in bash to do a base 10 conversion to ensure they don't have leading zeros (ex: 2019025.0859.01), as that leading zero makes for invalid semvers.

Error parsing version '20200821.1845.33-0276160': Version segment starts with 0

This one honestly looks like semver has encoded a type system bug into the spec. So while pre-releases seem like arbitrary strings in all other cases, if they happen to look like a number they cannot have leading zeros. That means our foolery causes some libraries to toss errors. https://semver.org/#spec-item-9 This can be fixed by replacing leading zeros with a letter: sed -E '/0[0-9]+$/s/^0/z/'

Should I rethink the semantic overloading of strings in other contexts, like filenames?

Yes. Please. We can build better interfaces than this madness of contortionism, stuffing our lives and systems into strings. Store the tiniest pointer you can back to some sort of real database, even if its just a UUID pointed to a JSON in whatever key value store that looks handy. You will not get the interface correct the first time, I know, I've tried, many times. Updating the "db" is easier than updating the elements out in the wild, every time.

--
Sam Rees, Computer Plumber
Chicago