Transforming a software ecosystem

Dave
7 min readAug 29, 2018

I was working for a company once that was aggressively working toward a transformation. There was a big capital expense budget for this transformation. Most of the company’s engineering and product resources were dedicated to this transformation (including myself).

Actually, I joined when it was already underway. Exciting, no? We were going to modernize our systems and transform the way we did business. There was tons of excitement around all things we could do with the data our new systems were going to give us access to. This is data we kind of already had, but it was locked in legacy systems, and apparently we couldn’t get it out.

These legacy systems were the core of the problem statement that engendered the transformation. They were brittle, difficult to change, we didn’t really know how they worked anymore, and there was tons of duplication.

So, obviously, we needed transformation! So millions of dollars were set aside for this. For future reference, let’s call it Operation Powerade.

But, what is it?

I’ve been around the block. Being a software engineer, I understand deeply the frustration of working with old systems, and the temptation to believe you can just start from scratch and be better off. Joel Spolsky was the first that I know of to reckon with the problems of “the big rewrite.” But it can be done well, provided you approach it the right way.

So, assuming that we were correct in thinking that replacing our systems was both necessary and feasible (let’s assume both for now), one question still remains for me: why won’t we be doing this again in ten years? Are these millions of dollars still worth it if we don’t get more than a few years of use from these new systems?

Or, put another way, are we just rebuilding the same inflexible systems?

The rewrite cycle

Though most experienced developers probably get this, I first heard it from Udi Dahan how common this cycle is. It goes like this:

  • An organization builds a software system
  • Over time the system becoming unmaintainable — to the point that the organizations believe a rewrite is not only desirable but necessary
  • The system is rewritten
  • In the best case, project actually succeeds, and the old system is retired; though often the old system lives in parallel with the newer systems, more-or-less in perpetuity
  • Before long, the new system becomes treated as a “legacy” system; its tech stack goes out of date; institutional knowledge about how to maintain it dissipates; budget and will to keep it updated disappears
  • The cycle repeats…we need to rewrite this and build a new system!

Ok. So we rewrite every 10 years. Is that so bad? Well, I’m pretty sure our executives weren’t planning to lay out this kind of money that often. Though, maybe they don’t care. But I care!

But the real reason to avoid this cycle is because rewrites are risky and expensive compared to iterative development. I won’t go ahead and rehash Agile, Lean, Continuous Delivery, and TDD, but they all are based on this principle: iterative development where customers are continually seeing value and changes are being integrated is cheaper and carries far less risk.

Or, as Jez Humble puts it, “work in small batches.” If the new, magical, System to Fix All Systems(tm) is able to integrate small, iterative changes, we might not need that future v3 rewrite in 10 years. Or if we do, it will carry much less risk and provide more value.

With me so far?

What’s different this time?

So, we are doing a rewrite. Fine. But can we make this rewrite that last time we have to do this? What makes this transformation different from the last time? Why is Operation Powerade going to result in anything more than tomorrow’s legacy systems that need rewrites?

Let’s consider some possibilities.

The tech stack is newer

No. Bullshit. Poppycock. The technology is always changing, and this “new” tech stack will be considered new for two years at the most. This one isn’t even worth discussing. If your new platform doesn’t consider plugging in completely different technologies in the future, so that future components/services can interop and evolve alongside the existing ones, you will be screwed.

As a side note, it’s worth pointing out that a benefit of building a message-driven microservice architecture is that you can evolve individual services independently, introducing new technology where it makes sense, instead of being forced to update your whole ecosystem in lock-step (credit to Jessica Kerr for this idea).

This is very powerful, because if you need to update you entire software ecosystem, you are spending just as much money on the stable, low-value parts as on the volatile, high-value parts that need the love. Independent evolvability (which is a characteristic of a microservices implementation done right) means that you can apply your transformation selectively, to the parts of the business that need it (independent evolvability can also be called investment flexibility).

Ok, so if the tech stack isn’t what makes this transformation different from previous ones, what is it? Let’s look at other options.

We will build it better this time

At first glance, this one smacks of hubris and self-deception. It’s begging for an eye-roll. Sure, we think we are more sophisticated than the previous set of engineers who built the system. Maybe even some of them are still around and think so to.

But let’s not dismiss it too quickly. Maybe I’m a wide-eyed optimist at heart (spoiler alert: I’m not), but I do think it is possible that systems can be built to last. Of course, for Operation Powerade, which employs scores of people across many teams, it won’t work if only a couple teams build better software. The key practices and architectural decisions that make it “better” have to be consistently applied. Note that this doesn’t mean the architecture needs to be monolithic, or that consistency needs to be applied everywhere, but there should be buy-in and enforcement of the patterns and practices that make the system flexible and evolvable.

So what are these patterns and practices? Well, I have some opinions on this! First, here’s what doesn’t matter:

  • Certain design patterns, like n-tier architecture, that may be appropriate for some services more than others
  • The work-tracking system (e.g. Jira, or post-it notes, etc.)
  • How individual teams are working, e.g. scrum vs. kanban
  • Almost anything tech-related: the data store the team is using (relational vs. graph vs. document), whether it’s in the cloud or on-premise, the tech stack

So what is the differentiator for Operation Powerade? What specific things might make this transformation stick?

Here’s what I think matters

While there’s no silver bullet, here are three things I think can make a decisive difference, allowing a transformation to survive for the long term:

  • Software teams aligned with business capabilities
  • Team independence → service/component independence (i.e. don’t be obsessed with reuse)
  • Well-understood contracts between services

There are so many things that an organization should do well that are not included here: TDD, observability, everything Janelle Klein talks about in her talk on How to Break the Software Rewrite Cycle (slides), the (excellent) principles of Continuous Delivery. So how can we leave these things out? Because while they may make or break a team or service, with proper boundaries in place the organization as a whole can survive the failure of one team.

As has been stated by many others, microservices should be built for replacement, not reuse (this hearkens back to the aforementioned benefits of independent evolvability).

The main objection to this is that we’re leaving money on the table for any components we can reuse. Developers in particular are conditioned to love reuse. Executives think that reuse will save future $$. But it depends on what we want for our system, reuse (i.e. the minimizing of wasted effort) or flexibility (i.e. support future change)? As Jessica Kerr writes, “Reuse is the enemy of change.

It’s important to recognize that when you optimize for one thing, you necessarily de-optimize for something else. You can’t simultaneously optimize for flexibility and reuse.

Each of these “differentiators” probably deserve their own blog post, so I won’t belabor them all here. Suffice it to say that they won’t have any effect unless you can really bake them in to the culture of the organization.

So make sure that a few of these things are well-understood by the tech leadership and effectively communicated to team members (across disciplines). That’s really hard to do! This is why you only want a few of these core values. Add in shared agreement on TDD, DDD, code standards, approved tools, shared services, and you will dilute focus on communicating the big picture of how teams and services should interact.

Next I’ll post on the first bullet point above, and think a bit on aligning software teams with business teams and capabilities.

Links

--

--