How to tackle technical debt

comments 3
Growth
Photo by Ehud Neuhaus on Unsplash.

Mining for debt

Recently on Slack one of my colleagues shared this comic from Monkey User.

I thought it was a great metaphor. 

The world of software moves extremely fast. Inside a given company the codebase is constantly changing with the addition of new features. Outside the company is an entire world of open source software development, shipping updates to all of the libraries, frameworks and databases that are being used.

With time, piling on ever more code creates moments where the team needs to stop and take a step back. They will need to think of a different way of moving forward that is more maintainable, controlled and less prone to bugs. 

Even if the internal codebase changes extremely slowly, external dependencies are always releasing new versions, requiring the team to upgrade them before they reach end of life. This can create further technical debt as APIs deprecate or breaking changes are introduced.

Quite often engineers struggle to make their case for prioritizing tech debt work. Why?

  • Lack of empowerment: They might not think it is their place to speak up about it; instead they expect that more senior people will dictate when to take stock, refactor, upgrade libraries or storage or so on.
  • Inability to persuade: They might not be able to construct an argument to spend time on it in a way that non-technical people that dictate the work streams will understand.
  • Apathy: They may have already lost hope that Product or any higher-ups will listen to them, and therefore silently let codebase or system degrade. “Features are more important,” they say. “They’ll never listen to us.”

All of these situations are a shame. They’re also not acceptable. But they’re fixable. Let’s have a look at them in turn.

It’s someone else’s problem

As an engineer, if you think that it is someone else’s problem to point out that there is a technical debt issue beginning to get out of hand, then – and I’m sorry to say it – you’re wrong. There are a number of traits that an excellent engineer will have, and a pride for their work and keen interest in the future state of the code are two of them.

Those committing code will know best about how the codebase is currently written and organized. They will be the first to begin to notice the bad smells. As they realize that continual dirty hacks are the only way of moving forward, it’s their duty to raise the flag.

The creation of technical debt is inevitable; as inevitable as the slow erosion of a chalk and lime coast by lapping waves, or the weathering of a old building. We should be comfortable with the fact that it is going to happen, and is likely happening right now, and we should be especially comfortable with alerting others when it starts to feel bad. We should fix the broken roof tiles before they become a leak.

Talk to your team about it. Talk to the other engineers that work on that codebase. Build consensus that there is a problem and that something should be done about it. Don’t wait for someone else to point it out. It is as much your responsibility as it is everyone else’s.

Shout.

Constructing the argument succinctly

Now that a technical debt problem has been identified, we’ll need to think about how best to argue for getting the time and space to fix it. 

Many engineering departments are building a product that makes the company money by selling it to external users. Some service internal users. I work in SaaS, and I would say that the expectations of our users are:

  • That our applications are available no matter the time of day or day of week.
  • That we’ll be continually adding new and innovative features to our products.

These expectations are pretty well understood by everyone in the business, regardless of whether they work in commercial, engineering, product, marketing, or wherever. That’s a good thing, because if you use one or both of them to construct your arguments about tackling particular pieces of technical debt, then it’s hard to be ignored.

Rephrasing the above two bullet points with a focus on thinking about technical debt:

  • The platform should be acceptably fast, correct (enough) and should have a very low likelihood of going catastrophically wrong with no prior warning. It is a very bad thing for business when this happens.
  • The codebase should be easy and efficient to work in as we continually add more stuff to it. If we can’t maintain a reasonable speed of adding new stuff, we begin to lose out to competitors, and the rest of the business wonders why we are getting slower, inviting lots of fruitless arguments about developer productivity.

We need to tie our arguments to these reasons. If engineers argue for doing technical debt work in a way that doesn’t make sense to the non-technical layperson, then it’s very hard to them to win hearts and minds in the business. They’ll wonder what they’re up to rather than shipping features.

Technical debt shouldn’t be fixed because it’s “obvious” or “the code could be better” or “it’s annoying” or a particular framework is now “the latest thing”. Those reasons may be entirely true, but the argument needs work.

Let’s have a look at some different scenarios.

  • “We need to upgrade Postgres.” OK, I totally understand. But we need to think of a better way of phrasing this to the non-technical person. What does the upgrade bring us? Is it some critical security patches? Does it have a positive effect on the speed at which the application is going to work? Does it have new features in the query language that will allow us to query the data in a new or better way?
  • “We need to refactor AnalysisPipeline.scala!” Nobody has any idea what AnalysisPipeline.scala does. Probably only a few in the department even know. Does it lack tests and is there causing a lot of bugs in written documents that are challenging to fix once they’re committed to storage? Is the class such a big monolithic mess that it is too hard to add new features at the rate that the business expects? Is it taking five times as long to work on as it would if it was split out into multiple classes, methods, modules or services?
  • “This service needs a rewrite.” Sure, it probably does. But what’s the real reason? Is it stuck on a framework that is now years beyond end of life and nobody knows how it works? Is it an area of the code that is going to have a lot of changes in the coming year, but the risk of it breaking is too high to keep adding to it quickly? Will the speed or stability of this particular service be much better if instead of working with it we just start again instead, taking advantage of the knowledge and technology that we have now?

Getting better at justifying why technical debt needs to be fixed isn’t just a skill that helps you get the clearance of your team lead or product owner to start working on it: it can also help you make up your own mind as to whether something is a real long term issue for the coming year or just a short term frustration for the current sprint.

Nobody will listen

If nobody will listen to your arguments about addressing technical debt, then first check that you’re constructing those arguments properly, as mentioned in the sections above. You are? Ace.

If a common pushback is that there are too many features queued up to build, then there may be an underlying worry from your product manager or line manager that fixing the technical debt will be a slippery slope that goes on forever and destroys productivity. 

One answer to this is to try your best to estimate the effort that it will take to fix it, and, better still, break that down into phases or milestones that can be incrementally worked on.

A tactic that works well to please both Product and Engineering is to balance periods of feature delivery with periods of tidy up and refactoring. In It Doesn’t Have To Be Crazy At Work, the creators of Basecamp pitch for periods of 6 weeks building followed by 2 weeks paying down technical debt. 

At Brandwatch we have employed similar tactics with a period of a team delivering a big ticket feature being followed by a fallow period where the team prioritizes and executes their most pressing technical debt concerns, such as refactoring, improving monitoring and writing documentation. The bonus to this way of doing things is it gives your product managers and designers time to ruminate on the next big thing.

Sometimes, however, there is a massive elephant in the room: a technical debt project so big that nobody wanted to talk about it, yet the swell has grown to the point where the wave is going to break – either with the codebase continuing to become a complete mess, or the platform becoming increasingly slow and unstable.

In this situation, honesty and transparency is the best policy. It is the job of the leaders in Engineering to elevate a large technical debt problem into a separate work stream in order to give it the recognition, space, and resources that it needs; typically a dedicated team over a longer period of time. 

In doing so, the principles above are just as valid: raise the flag, gain consensus, plot an approach, and make the problem understandable to the layperson. Make it clear that the future is brighter by doing this work.

Convince them that it would be silly not to do it because the future of the business depends on it. Then sort it out.

In summary

Remember that if you are an engineer, it’s your job to raise technical debt issues as early as possible, and to make sure that you are able to explain their impact in succinct and meaningful ways. Managers: it’s your job to listen and to create the space for the issues to get worked on.

Building a successful SaaS business requires a stable application and the ability to work quickly and efficiently: both of these things are impacted severely by technical debt, so don’t let it build up. Pay it down.

Switching to a remote manager

Leave a comment
Growth
Photo by Marius Christensen on Unsplash.

git merge

In the last four weeks, I’ve made a transition from having my line manager based in the same office, which has been a situation I’ve been used to for all of my professional life, to having them be remote. In my case this has happened because of the merger of Brandwatch and Crimson Hexagon. The CTO of the combined company is now based in Boston, and I’m in Brighton, England.

I have a VP Engineering role, which, silly job title aside, means that I have a division of the Engineering department reporting to me, focussed around building our Analytics and Audiences applications. We have other divisions of Engineering focussed around our infrastructure and compute, our data platform and the Vizia product. At the time of writing, I have 38 people in my division.

I’ve been fortunate to have always had the CTO in the same office over the recent years. As the company has continued to grow at a fairly fast pace, I’ve had local support. Ideas, thoughts, gripes: they’ve been there in the same place or on the same timezone.

There have been a number of benefits to having the leader of the department co-located:

  • My staff have been able to get to know him easily. We’re all just around most days. This makes them feel connected all of the way up the chain with minimal effort.
  • The general narrative of what’s going on, such as happiness, morale, stress levels, has been observable by both myself and my manager.
  • If there’s ever a crisis – of people or of production systems – then, most of the time, 35 steps is all I’ve needed to get some counsel or a second opinion.

However, things are now quite different.

After our companies merged, the CTO role was given to the Engineering department leader in the other company, putting myself in an interesting position:

  • I now have a manager who is not in the same physical location, so I lose out on all of the informal in-person contact that I had before.
  • My manager is now 5 hours behind me, meaning I have less times of the day in which to speak to him.
  • The new CTO doesn’t initially know me or any of my people; only what we’re responsible for. The rest is a black box.

Letters across the pond

Over the last few weeks, as was expected by the merger, we’ve both been very busy, both with logistics and with traveling. Our weekly hourly 1 to 1s often end before we’ve managed to cover everything off, and then we’re sliding into another meeting before clearing all items on our agenda, which has been frustrating.

Because these weekly catch ups didn’t seem like enough time, and because email chains typically devolve into stasis, I started writing a weekly digest which I send each Friday afternoon. The idea was that I could take some time to properly summarize everything that was going on in my world and flag anything that I needed help with. 

This has been working really well. 

I write it in a Google Doc, which means that a lot of the smaller items can get covered off asynchronously via the comments. Larger items that are worth spending some more time on become the focus of our conversation in our 1 to 1, and that more precious face to face time is spent on the meat of the main issues, rather than on the periphery. Both of us enjoy written communication too, so this works very well. It also gives us an ideal chance to poke fun at our Britishisms and Americanisms.

Here’s roughly what I cover in the weekly document. It takes me about 30 minutes to write:

  • Any interesting developments in any of the ongoing work streams, such as new links to demos, updates on estimates, or anything particularly good or bad that’s unfolding.
  • The latest on what’s next in the project pipeline from conversations with Product.
  • The general feel within the teams, such as happiness and morale. Are any of them overworked, or, on the contrary, spinning the wheels while waiting for a decision on the next thing? Are the teams right sized and is this looking true for the coming months?
  • An in-depth look at anything that’s front of mind right now, such as hiring, or thoughts about backend architecture and scaling, or contemplations over cool ideas we could pitch to the Product team.
  • A list of “documents of interest”, such as designs for upcoming features or architecture, or the fortnightly product and engineering updates that get sent out. I don’t expect any of these to be read in detail, but they’re there to satisfy any curiosity.
  • Occasionally a light sprinkling of GIFs. Because life’s too short to not use that one of Kermit furiously slapping the typewriter.
Yes, that one.

Soap opera rather than novel

I’ve been trying to open up my black box as much as possible to give my new manager a view into the decisions that I make on a day to day, and to allow my thought processes to be observed and discussed. However, the style of writing was challenging at first: how do I make the digest interesting and not a labour?

Given that my new manager was taking the role of the reader and I was the author, I didn’t really know where to start or how to collate my thoughts. But then I came to realize that it wasn’t my job to be the creator of a novel, thoroughly documenting everything that happened. Instead I needed to take the position of a screenwriter of a soap opera: an inventor of a regular rolling feed of narrative that is easy to soak in, letting the reader learn the characters and plot lines gradually by osmosis.

Tuning into The Wire halfway during Season 3 can leave you feeling a little lost and overwhelmed by the detail, but switching on Eastenders a couple of times during the week allows you to (assuming you want to…) follow along pretty easily. I decided to be more Eastenders, except with less arguing and fighting in the Queen Vic.

I scatter the document with parts prefixed with “Your thoughts please…” where I’d like to get some input. We usually chat on the comments around these parts.

Getting comfortable with async await

Although I thought that the experience may be more jarring at first, I think that I am getting better with a predominantly asynchronous relationship. 

There can be some benefits to having a remote manager, after all:

  • Because our face to face time is more valuable, we prepare more for when we do talk, meaning that conversations are rewarding.
  • We do a lot of written communication, which allows us to think more deeply about what we’re saying and how we’re saying it before presenting it to one another.
  • We have to continually operate from a place of trust, since we cannot easily insert ourselves into each other’s worlds to observe and come to our own conclusions. I like this.
  • I feel like I have to step up and represent my people more, in terms of my personal accountability and in promoting their cause, which can only be a good thing.
  • The introduction of even more extreme timezone differences across the now global Engineering department means we need to get better at being a company that supports flexible remote working, fast. I would like to think that being forced to break our predominantly European timezone habits will make it easier for us, in time, to hire people remotely all over the world.

But, still, a quick chat in the kitchen is nice, and is missed.