How Large Codebases Impede the Flow of Value

By Alex Yakyma.

The Bottleneck that Gets Systemically Ignored

Enterprise coaches, consultants and change agents demonstrate worrying disregard to the System Under Development (SUD) as part of their transformation, often distracted by other “conventional” areas of concern. This is often the result of incorrectly identified organizational bottlenecks or even complete lack of any such analysis in the first place. Instead enterprises often choose to follow some magical organizational change management script that fails to address systemic issues. None of it works properly when dealing with modern software development organizations due to their inherent complexity. Sometimes the reason for inattention to the SUD, however, is even more trivial: change agents don’t feel comfortable with technical concerns and therefore avoid them at all cost, assuming incorrectly that they can overcompensate elsewhere. This, as our reader will see throughout the article, should in no way be a reason for dropping the ball. In fact, real “techies” who are part of the system, often manifest such a huge attachment to their creation that a lot less technically savvy person would be able to give a much better technical advice, simply because they didn’t have the same cognitive bias. One way or the other, systemic issues need to be addressed, not avoided, and we will show very straightforward logic in this article, that will help change agents deal with this problem whether they have a deep technical background or not.

Impediment to the Flow fo Value

So, how can the system you develop become an impediment to flow? That sounds somewhat contrary to the idea that your SUD actually should represent the ultimate value to the customer. How come that value is suddenly at odds with the flow… of value. Totally confusing…

Here’s how. Think of the system your organization develops as of a thick cobweb of interconnected components, services, classes, functions, etc. (In fact, such an analogy would be very close to reality.) Imagine now you need to fix a production defect or add some new functionality in such a system. You pull a couple of strings you could think of, hoping that those would suffice to solve your problem, but suddenly it turns out that one of them is connected to another that goes into the depth of the web and in turn pulls a few more. You continue the quest to find out that there was a lot more involved than you had initially thought. All of a sudden, a task that seemed to be a 30-minute bug fix, grew into a week-long job resulting in a significant update with its own hidden defects that will be discovered much later, after even more cobweb will be woven around it. You got the idea…

As the SUD grows, it becomes harder and harder to control. In fact, it starts in a sense to control you, imposing ever higher capacity demands and bringing more and more variability to the outcomes. And this is mostly not the “good” variability which allows you to naturally uncover new economic opportunities. Rather on the opposite, it simply adds to an already staggering level of disorder. I strongly suggest that you familiarize yourself with the Lehman’s Laws (see [1]) in this regard.

At some point adding any new piece of functionality to the system becomes so hard that “techies” give up and start campaigning for a full system rewrite. And this time they’re gonna make it right. It’s as if the previous system was driven to a halt not by them and not under the unreasonable pressure of the same business stakeholders…

Research Results

The scenario that we described above sounds too familiar. But is it always the case that large SUDs negatively influence the flow or you and I just were too “lucky” to experience something like that in our professional career? Good question. There is a definitive answer to it. Let’s dig a bit deeper.

Recently we have completed a research, examining over fifty different software systems with an objective to explore the interconnectedness of their constituent parts (see article [2], it would be very useful to familiarize yourself with the logic and data provided in the article; it’s also a short read). We analyzed how thick are those “software cobwebs”. The results unambiguously showed the connection between the size of the system and the “spread” of different degrees of connectivity. More simply put:

In a large codebase, extreme events have significant probability.

In other words, large codebases show high inherent variability of outcomes, which naturally leads to high variation in the flow. Also, the bigger the codebase, the higher this variability is (it grows sub-linearly with codebase size, but that’s hardly of any relief). That being said, when the complexity of the codebase begins to significantly impact the flow of value, it matters very little how fluent we can get with stickies, meetings and dot voting; the problem is absolutely elsewhere and requires a different solution.

Economic Impact

The reason why we often don’t understand the economic impact of the SUD complexity is the same why we don’t see that impact until it is too late: we are not geared for it. Indeed, our planning is mostly capacity-based (not throughput-based) and we simply fail to notice how items gradually become more “greedy” in terms of capacity demand. Story points don’t help here either, by the way; they in fact may obscure the issue a lot more due to their etherial nature, not bound to anything real. Instead what is needed in diagnosing the problem is:

  1. Measuring the actual flow indicators rather than their proxies
  2. Understanding averages and variance in cycle time, lead time and throughput
  3. Analyzing trends over time

If (1), (2) and (3) are in place, the trend in cycle time, for example, may easily show that something is progressively wrong with our ability to develop new features, which upon closer consideration may be due to the problems with the SUD. In (2) we said “averages and variance” for a reason. In fact, it is characteristic of this type of problems to simultaneously lead to longer durations and higher variability.

But economic impact doesn’t end here, with just longer and less predictable feature implementation durations. What inevitably comes in play is an interesting epiphenomenon of such volatility: increased delayed error rate. This is a “premium” on top of everything else that leads to more sneaky defects, architectural and other nonfunctional flaws, decreasing readability and maintainability of the system. The actual cost of a feature may grow a couple of orders of magnitude over the course of a few years without your organization even suspecting what’s going on…

What to Do About It?

The most commonly seen “solution” is actually the least desired! Yes, I’m talking about throwing more people at the problem, which Fred Brooks rightly calls out as a very poor approach to the problem (see [3]). Here’s something important to remember:

While it is possible to somewhat mitigate the impact of complexity of a large software system, no such improvement is able to  significantly reverse the dynamic and claim back the long lost organizational productivity.

In fact, I want to strongly caution the reader against any “architectural miracles”, including those that involve significant “wholesale” system redesign or even full rewrite. Those are often poor economic decisions, based on unsubstantiated estimates and wishful thinking. Improvements are absolutely possible, but they require strong discipline of highly incremental evolution of the design, which is a foreign concept to most organizations.

The solution is simple:

Stop uncontrollably loading crap into the system, just because you have the capacity to do so!

Or rather let me restate this in a lot more politically acceptable lingo:

In an organization that does not posses a viable validation capability, most of the functionality represents very little actual business value. 

Indeed, most organizations and their product management is simply not geared for proper experimentation and validation. Despite the formal advent of Agility, most requirements committed to development represent no more than wishful thinking. The fact that we are using backlogs and prioritization, while actually never crossing the organizational boundary and setting a real outbound experiment, plays a nasty trick on our organizations: it solidifies the conviction that “we are agile”, while in fact we are simply operating in an open-loop system. This causes also a “wholesale” psychology among business stakeholders: if the cumulative amount of features produces less value than we’ve expected, let’s load more next time! …An absolutely flawed thinking. This is as bad as sewage overflowing into the water supply. After all, we never really made the transition from thinking requirements to thinking hypotheses and experiments; and that is an absolutely required prerequisite to be able to successfully adopt the following tenet:

An organization should seek how to implement less, not more, functionality to provide sustainable flow of high value now and in the future. 

This drastically changes organizational logic and tremendously slows down the growth of complexity. And the truth of the matter is that not so many organizations will be able to essentially advance in this direction, as almost two decades of Agile adoption had reliably demonstrated. But this means two great things, if we are being conscious about it:

  1. We are living in a real world, not in some illusionary Happy Land
  2. When properly implemented, this strategy represents a very significant competitive edge… simply because your completion most likely doesn’t do it properly

 

References

  1. https://en.wikipedia.org/wiki/Lehman%27s_laws_of_software_evolution
  2. Network Complexity and Volatility of Software Systems, http://orgmindset.com/network-complexity-and-volatility-of-software-systems/
  3. F. Brooks, The Mythical Man-Month, Addison-Wesley, 1975.