SemVer is Broken! Introducing Algorithmic SemVer

Git in Action
Photo by Yancy Min on Unsplash

This article discusses Algorithmic SemVer as an alternative software versioning scheme.

The Limitations of SemVer

The promise of Semantic Versioning (SemVer) is tantalizing. It paints a world where software version numbers become more than just arbitrary labels but concise indicators of the changes they house. In theory, a quick glance at a version number should tell us whether it’s safe to upgrade, what kind of changes to expect, and how much testing might be needed.

However, as is often the case in software engineering, the real world is messier than our theoretical models.

Semantic Versioning, commonly known as SemVer, presents a structured approach to versioning that attempts to convey the nature and risk of software changes through a combination of three primary numbers: major, minor, and patch. However, while it is theoretically elegant, there are several practical challenges and limitations:

1. Ambiguity of Interpretation

Even with a defined standard, there’s a degree of subjectivity in deciding whether a change should be classified as major, minor, or patch. Different developers or teams might have varying thresholds or understandings of what constitutes a ‘breaking’ change.

2. Dependency Hell

As software ecosystems grow, dependencies multiply. Projects might depend on specific versions of libraries, which in turn rely on particular versions of other libraries. Even minor and patch updates can sometimes lead to unanticipated problems, creating a cascade of potential issues in the broader ecosystem.

3. Hyrum’s Law and Unintended Consequences

As highlighted by Hyrum’s Law, users, in practice, depend on every observable behavior, not just the explicitly documented ones. This means even the tiniest and seemingly most insignificant changes can break someone’s setup, contradicting the ethos of SemVer. As Hyrum’s Law underscores, users and systems come to depend on all observable behaviors, not just those in the official API. SemVer, in its traditional sense, doesn’t account for these unofficial dependencies, making even minor or patch updates risky.

4. Cybersecurity Misunderstandings

One of the major pitfalls of SemVer in the realm of security is the misunderstanding by professionals, particularly within frameworks like the U.S. Department of Defense’s Risk Management Framework (RMF). The RMF provides a structured process for integrating cybersecurity and risk management activities, but there’s a tendency to perceive major version changes in SemVer as indicative of significant security shifts. In reality, a major version bump might have nothing to do with security; it could merely be due to backward-incompatible API changes. Conversely, a minor or patch release might introduce or fix critical vulnerabilities. This misunderstanding can lead to misallocated resources, with undue attention given to major releases while potentially overlooking security issues in minor or patch updates.

5. Incompatibility with Continuous Deployment

In modern software development, continuous deployment and integration are becoming the norm, where changes are frequently pushed into production. In such environments, manual versioning like SemVer can become cumbersome and less meaningful, as the distinction between major, minor, and patch changes blurs.

6. Overemphasis on API Changes

SemVer largely revolves around API changes, but many software products have other aspects that can be equally important. For instance, changes in a GUI or performance optimizations might not affect the API but can still be significant for end-users.

In summary, while SemVer offers a structured approach to versioning, its limitations, especially in complex, real-world scenarios, highlight the need for more nuanced or even automated solutions. The challenge is not just to categorize changes accurately but to ensure that stakeholders, including security professionals, understand the implications of each version change.

Current Alternatives to SemVer

As dissatisfaction with SemVer’s constraints and ambiguities has grown, various alternatives have sprung up. These alternatives aim to address some of SemVer’s limitations, each taking a unique approach to versioning. While no scheme is perfect, lessons from these schemes can be applied to the development of Algorithmic SemVer. Let’s explore a few:

1. CalVer (Calendar Versioning)

CalVer uses the release date as a primary component of the version identifier. Typically formatted as YYYY.MM.DD.MICRO, this approach provides an immediate sense of the age of a release. However, CalVer doesn’t inherently indicate anything about the nature or compatibility of changes, merely when they occurred.

Pros:

  • Immediate sense of release age.
  • Removes ambiguity of version increment choices.

Cons:

  • Lacks indication of API changes or compatibility.
  • Potential confusion if frequent releases occur.

2. Hash-based Versioning (HashVer)

The HashVer approach leverages the unique hash of a commit in a version control system (like Git) as part of the version identifier. The format generally looks like: 2020.01.67092445a1abc. By tying each release to a specific point in development history, it’s straightforward to track and, if necessary, revert changes.

Pros:

  • Direct correlation between release and its point in development.
  • Practically guarantees unique version identifiers.

Cons:

  • Not inherently informative about the nature of changes.
  • Hashes can be non-intuitive and cumbersome for human reading.

These alternatives each bring their own strengths and weaknesses to the table. For many projects, a combination of these strategies might provide the most clarity. However, the quest for the ‘perfect’ versioning scheme continues, leading us to explore more nuanced, potentially algorithm-driven approaches.

The Need for a New Approach: Algorithmic Semver

SemVer, at its core, tries to convey the nature and risk of software changes through a structured version number. But the manual nature of SemVer means it’s susceptible to human error, interpretation differences, and even misuse, especially in the realm of cybersecurity where major versions are sometimes wrongly equated to significant security alterations.

So, the challenge is clear: Can we create an automatic, algorithmic approach to versioning that conveys more accurate information about the changes contained within a release?

Algorithmic SemVer: Addressing the Flaws of SemVer 2.0

Embracing a more sophisticated, automated, and context-aware versioning approach, Algorithmic SemVer promises to rectify many of the inherent flaws in the traditional SemVer 2.0 system. Here’s how it addresses each of the identified pitfalls:

1. Ambiguity of Interpretation

Algorithmic SemVer Solution: By using automated code analysis tools, versioning decisions are made based on consistent, predefined rules rather than human interpretation. This ensures that similar changes always result in similar version increments, eliminating the ambiguity arising from human judgment.

2. Dependency Hell

Algorithmic SemVer Solution: A deeper automated analysis can assess how changes in one component might affect dependencies. By creating a comprehensive dependency graph and predicting potential conflicts, Algorithmic SemVer can provide warnings or suggestions for version increments that minimize dependency issues.

3. Hyrum’s Law and Unintended Consequences

Algorithmic SemVer Solution: Incorporating behavioral comparisons through automated testing suites allows for observing and versioning based on actual system behavior, not just code changes. By comparing the behavior of old vs. new releases, even changes in undocumented behaviors can be flagged and versioned appropriately.

4. Cybersecurity Misunderstandings

Algorithmic SemVer Solution: Integrating automated security scanning tools ensures that potential vulnerabilities or security improvements are systematically recorded. By algorithmically tying security findings to version increments, it ensures that any security implications are appropriately reflected in the version number, regardless of whether they correspond to major, minor, or patch changes.

5. Incompatibility with Continuous Deployment

Algorithmic SemVer Solution: In a continuously deploying world, automated versioning is not only preferable but almost essential. Algorithmic SemVer can immediately analyze and assign appropriate version numbers post-integration, streamlining the deployment process and ensuring version numbers always remain relevant and informative.

6. Overemphasis on API Changes

Algorithmic SemVer Solution: While API changes are vital, they are just one dimension of software evolution. Algorithmic SemVer, by leveraging a holistic analysis approach, considers all aspects of the software, from GUI modifications to performance enhancements. This ensures that version numbers reflect the full spectrum of changes, not just those limited to the API.

In essence, Algorithmic SemVer transforms versioning from a manual, interpretive task into an automated, comprehensive analysis. By doing so, it offers a more accurate, reliable, and holistic approach to conveying the nature and implications of software changes.

Implementing Algorithmic SemVer

Here’s a conceptual approach towards an “Algorithmic SemVer”:

  1. Automated Code Analysis: Use static code analysis tools to determine the scale and nature of code changes. If API signatures are altered, that’s a clear sign of a major change. However, the devil’s in the details. The mere addition of a new function isn’t always a major change. We need the tools to be context-aware, understanding usage patterns and potential impacts.
  2. Behavioral Comparisons: Instead of just looking at code, employ automated testing suites to compare the behavior of old vs. new releases. If a newer version produces different results or has altered side effects, this should be reflected in the versioning. This can also capture changes in undocumented behaviors that users may depend on.
  3. Security Assessments: Integrate automated security scanning tools. Any potential vulnerability or security improvement gets recorded. A minor security fix might increment the patch version, while a significant security overhaul or detected vulnerability can impact the major version.
  4. Feedback Loop: The algorithm should learn from past mistakes. If a release deemed by the system as a ‘minor’ caused significant issues for users, the system should recalibrate its assessment algorithms. This is analogous to Test Driven Development (TDD), where the tests, in this case, are the algorithms that control the version number.

The Result

Imagine a version like 4.3.7.2b45a6. The first three numbers (4.3.7) still adhere to the Major.Minor.Patch convention is now determined algorithmically based on the criteria above. The trailing hash (2b45a6) could be a shortened commit or build hash, giving users and developers a quick reference back to the specific build or changeset.

The Road Ahead

While the Algorithmic SemVer approach can help in many scenarios, no system is foolproof. It’s essential to recognize that any versioning system, whether SemVer, CalVer, HashVer, or Algorithmic SemVer, is just a tool—a means to convey information.

The real challenge, and perhaps the next frontier in this space, isn’t just to come up with a more accurate versioning system but to reshape the way developers, users, and stakeholders think about software changes. This mindset shift, paired with sophisticated tools, holds the promise of making software updates more predictable, understandable, and reliable. In the meantime, Algorithmic SemVer can become your team’s reasonable default for a versioning scheme.

One thought on “SemVer is Broken! Introducing Algorithmic SemVer

Leave a Reply

Discover more from John Farrier

Subscribe now to keep reading and get access to the full archive.

Continue reading