5 Tips to Find Your Way Around A Legacy Codebase

Published June 15, 2021

understand legacy code

Have you ever struggled to understand a codebase that was bigger than you?

Most of us go through this experience more or less often in our career, and this is not a simple thing to do. Chances are you’re in this situation right now.

During one occurence of the Software Craftsmanship meetup somebody was asking for advice because he had been thrown into a gigantic legacy codebase with no documentation, no architecture and discouraged developers. The total opposite of an expressive codeline.

Well, even when the code isn’t expressive, there are ways to understand its meaning.

Lots of developers face this sort of situation, with varying degrees of intensity (his case was pretty bad). Even though the codebase I’m working on is in better shape than what he was describing, I too have had to figure out the code of a large codebase, with some parts being legacy code. And today I’m even training younger developers at getting better at this.

Now that you’ve adopted the right mindset towards legacy code, let me share with you the most efficient ways I have found to get up to speed with a large codebase, even if its code isn’t always expressive.

Those tips will help you be more efficient and, I hope, boost your motivation as a software developer.

1) Choose a stronghold and conquer an empire

Do you remember those strategy games where you start with a little base and then you have to develop and explore the surroundings, which are blacked out when you start the game?

legacy code

I find that these games are an excellent comparison for a codebase.

One efficient way to start the exploration of a legacy codebase is to find a place you understand very well. This will be your stronghold to start the game with. And it doesn’t have to be big! It can even be one line of code.

understand legacy code

Let’s take an example. The software I deal with does various sorts of computation, but there is one very simple one: a linear interpolation. It consists of estimating a value between two values that are already known.

And we know its formula perfectly well: y = y0 + (x – x0) * (y1 – y0) / (x1 – x0), independantly of any software representation. I’ll choose this as a stronghold. Therefore I’ll focus on locating it in the codeline.

A good stronghold needs to be a bit specific. For instance, a function that puts a string in upper case in not in itself a good stronghold, because it is typically used in several unrelated places across the codebase. Rather, some business code that calls this function in a context that you know in the application is a better stronghold.

Once you find your stronghold, clutch to it. It constitutes a starting point from which to begin your exploration of the codeline.

Indeed, there is a high chance that you can figure out the immediate surroundings of that one line of code. And little by little things start to make sense. Little by little you’ll be expanding the area you’re comfortable with, and the dark area on the map will be shrinking.

I found this technique really helpful for starting out. However it takes time, and it won’t let you get to the ends of the world very shortly, particularly if your codebase has hundreds of thousands or millions of lines of code. This is why you need more.

2) Work your way up and down a stack

understand legacy code For this one you’re going to need your manager (or someone who’s familiar with the architecture of the application) to sit down next to you. Or if you’re the one who knows, sit with your padawans for this one.

The idea here is to fire up the debugger, find a judicious place in the code to put a breakpoint, and launch a use case in the application. The experienced person is here to choose the ‘judicious’ breakpoint, that is one in a deep stack of a typical use case of the application.

Then look at the call stack. It displays in one shot all the layers of the application involved in this use case.

This way you can learn a lot about the architecture of your software: what the main modules and the main frameworks are and how they relate together. A call stack tells a long story. Plus, some debuggers display the module name for each function in the call stack. Visual Studio does it if you do Right click > “Show module name” on the call stack.

You can repeat this experiment for several call stacks in the same use case, in order to get a grasp of the sequencing of the calls.

3) Start from the inputs and outputs of the application

If you don’t understand the code and no one is there to explain it to you, there is still hope. You can at least find someone who understands the application from a user point of view, or at least partly.

Then focus on something you understand in the application and that you can visualize, in the Graphical User Interface, or in any other form of input and output. Then find where this input comes into the code.

Once you’ve found it, seize it like a thread and follow it, until you reach the code of one functionality that you recognize, even if this functionality boils down to just one line of code. This will be your stronghold, and then you can apply advice #1 above.

4) Make a refactoring to decouple the code

Refactoring a piece code is a great way to get familiar with it. But not all refactoring will give you the same amount of knowledge for the same time invested in them.

Indeed, even though the refactorings that clean up the code have value, here I’m talking about refactorings that change the structure of the code. Or sometimes it rather looks like putting a structure into place. And you can achieve that by decoupling components.

But don’t worry, you don’t have to revise the architecture of the whole application to do this! You can just tackle a function and split it into sub-functions. This will give you a better overview of the sequencing of actions in this function, as well as a detailed view of the data that comes into play in it.

You can also decouple data processing from objects. This one doesn’t apply in all cases but when it does, you hit it big.

For example, imagine a class that contains data members and methods that operate on them. If this data processing is also used by another class, then the two classes are coupled and the code is weird. In this case you can take the behaviour out of the first class, so that both classes uses this extracted behaviour, and not each other.

The code becomes simpler and more extensible afterwards. But in the process, you’ve seen all the details of the processing of this particular data. It makes you learn a lot about this part of the program, both in terms of code and in terms of funtionality. This was the first refactoring I made on the codebase I’m working on today, and it made me an expert on this part of the application. Special thanks to my fantastic manager Patrice for teaching me this.

If you want to dig more into refactoring, Refactoring by Martin Fowler is a classical book on the subject. And refactoring legacy code goes hand in hand with the topic of tests, all of which you can read about Working effectively with legacy code by Michael Feathers.

5) Find a “padded room” function

understand legacy code

This is a technique that I often use with the younger developers that I manage. I pick a big function in the area they’re going to work on, that has a complicated implementation, but that has little to no dependency on anything else. You have some of those functions in your codebase, don’t you?

I find they are a great place to sharpen your code reading skills. It takes a little bit of time but you end up understanding and then mastering a least some parts of it. And like in a padded room you can’t hurt yourself, because the function only goes so far and you don’t have to get lost into another complex function and another, etc. It is self-contained.

This exercise in the padding room function makes you more familiar with the coding style of the codeline. I’m not saying that it is always a model for writing your own code, because some legacy codebases don’t exactly have the best style. Rather, it will make your eyes used to the patterns of the codebase, and you can apply this skill pretty much everywhere else inside it to understand code more quickly.

It will come with time anyway

Even if tackling a large codebase that you haven’t written yourself seems like a dauting task at first, it gets easier with time. The bigger the area of code your master and the more you understand what your application is supposed to do, the more quickly you’ll understand a new part of it.

But this comes with time, and a steady amount of effort. Find your stronghold, analyse your stacks, decouple the code and hone your skills in safe padded room functions. Then conquer the world of your software.

The Right Attitude to Deal with Legacy Code

Don't want to miss out ? Follow:
Share this post!

About Jonathan Boccara

5 Tips to Find Your Way Around A Legacy Codebase

1) Choose a stronghold and conquer an empire

2) Work your way up and down a stack

3) Start from the inputs and outputs of the application

4) Make a refactoring to decouple the code

5) Find a “padded room” function

It will come with time anyway