Extract Function: Should I Extract the Condition Too?

Published August 27, 2019

Long functions are hard to read, hard to maintain and hard to understand in their entirety. All in all, they contribute to making our developers lives more difficult.

But there is one nice thing about long functions: bashing them down into smaller units to make the code more expressive.

This is one of the most fun and rewarding refactoring tasks I know of. It’s like slashing at a dragon who tries to to roast you up until you’ve made it a pile of Tupperwares of dragon sausages and meat for dragon burgers.

The other day I was slashing away at a pretty big function and came to a piece of code that looked like this:

// code...

if (shouldDoX)
{
    // plenty of code
    // to do X...
}

// code...

This is an invitation to extract the code in a function doX. But to do this there are two options for the resulting code:

Option #1:

// code...

if (shouldDoX)
{
    doX();
}

// code...

Option #2:

// code...

doX(); // <- this function contains the if statement

// code...

Which option is better? When you extract the code, should you extract the condition along with it?

Option #1 sort of looks more explicit. But on the other hand, Option #2 feels more rewarding because the long function has become even shorter.

Having no idea what was best, I asked their opinion to the rest of the world:

I need to extract code like this:

// code…
if (shouldDoX)
{
// code to do X
}
// code…

Which option do you prefer? Why?

1)
// code…
if (shouldDoX)
{
doX()
}
// code…

or

2)
// code…
doX() // this function contains the if
// code…

— Jonathan Boccara (@JoBoccara) July 23, 2019

That sparked quite a discussion, which made me choose with confidence between Option #1 and Option #2. In my specific case I went for Option #1, but there are other cases where Option #2 is what you want.

I’ll try to summarise here the takeaways of that discussion.

Names should be honest

Even though Option #2 does a better job of shortening the code, it has a major drawback: it says that it will take care of doing X, but maybe it will change its mind and won’t do it in the end. This is confusing:

void doX()
{
    if (shouldDoX) // er, having second thoughts
    {
        // code of X...
    }
}

This makes the code less expressive because if you read the calling function you’d think that X is executed, but maybe it’s not.

Imagine you’re a maintainer of that code and there is a bug in shouldDoX. There is a chance that you’ll step over doX without looking at its implementation, thinking with every right that doX does X.

It’s only after spending a bit of time wondering what’s wrong that you’d descend into doX, find out that it doesn’t do X because it thought it shouldn’t.

The fact that Option #2 has a function name that doesn’t say what it does made a majority of people on Twitter prefer Option #1.

Some suggested to change the name in Option #2 to maybeDoX, or doXIf(shouldDoX), but Option #1 looks more straightforward in that case.

Also, Option #2 may require an additional argument to pass to doX, and additional parameters complexify the function’s prototype.

Cases for Option #2

But it’s not that simple. There are cases where Option #2 makes more sense than Option #1.

Peter Bindels gave an interesting example to illustrate this: consider a function that turns the light on. It makes sense to turn the lights on only if the lights are not already on.

The initial code in the long function would look like this (this is my interpretation of Peter’s tweet, as this wouldn’t fit in a Tweet–Peter correct me if I misunderstood):

// code...

if (lightsAreOff)
{
    // toggle the switches
    // to turn the lights on
}

// code...

We could change it to this:

// code...

turnLightsOn();

// code...

With turnLightOn being:

void turnsLightsOn()
{
    if (lightsAreOff)
    {
        // toggle the switches 
        // to turn the lights on 
    }
}

The function turnsLightsOn takes care of doing whatever is necessary to have the lights on. If the lights are already on, it doesn’t have anything to do, but it’s an implementation detail. For that reason it’s better left inside of turnLightsOn.

`canDoX` or `shouldDoX`?

Another case for Option #2 is if shouldDoX is rather a canDoX. Then you may prefer that doX handle the case where canDoX is false, rather than the calling code:

void doX()
{
    if (canDoX)
    {
        // code for X...
    }
    else
    {
        throw CantDoX{};
    }
}

The example has a function returning void, but if the function returns a value then there are other ways to handle errors:

std::optional<Y> doX()
{
    if (canDoX)
    {
        // code for X...
        // return make_optional(y)
    }
    else
    {
        return std::nullopt,
    }
}

Either way, you may prefer that the function takes care of its own the error handling rather than the calling code.

It all comes down to levels of abstractions

As with many choices when writing code, we can give a general answer to this problem with what I consider being the fundamental principle of programming: respecting levels of abstractions.

Several people mentioned levels of abstraction in the Twitter thread, including Arne Mertz who said is explicitly and other people that had arguments that came down to that too.

To respect levels of abstraction, the rule is then: if shouldDoX is at the level of abstraction of the calling code then prefer Option #1, whereas if shouldDoX is at the level of abstraction of doX then prefer Option #2.

Another way to put it is this: if shouldDoX is at the level of abstraction of the calling code, and we put it in doX, then we have a problem: we’re breaking the Single Responsibility Principle, because doX is worrying about two things that don’t go together.

Now how do you know at which level of abstraction shouldDoX is?

You can have a gut feeling about that. This is a shortcut that experience can provide.

But in case you’re not sure, there are objective criteria that help determine if shouldDoX is at the level of abstraction of doX, or at the one of the calling code above.

Here are two ways to determine that: the else test and the code reuse test.

The `else` test

The original code of our problem was this:

// code...

if (shouldDoX)
{
    // plenty of code
    // to do X...
}

// code...

Now let’s make a thought experiment and imagine that there was an else branch:

// code...

if (shouldDoX)
{
    // plenty of code
    // to do X...
}
else
{
    // code to do Y...
}

// code...

With such code, we can no longer just write doX, because there is some Y involved.

Then our two options become:

New option #1:

// code...

if (shouldDoX)
{
    doX();
}
else
{
    doY();
}

// code...

New option #2:

// code...

doXorY();

// code...

Then the choice becomes much easier to do. XorY is generally a bad name because it hints that the function has several responsibilities. If there is a better name than XorY that abstracts the concept of the whole if-else statement, then new option #2 makes sense. Otherwise, new option #1 is the way to go.

This analysis allows to decide if shouldDoX is at the same level of abstraction than doX.

Put another way, in the original case with just the if statement, imagine there was an else. Would you have extracted two separate functions doX and doY? If yes then you should keep shouldDoX outside of doX, and go for option #1. Otherwise you can put it inside of doX and go for option #2.

It is this else test that helped me take a decision with confidence in my original case.

The code reuse test

Another way to look at it, suggested by Berado on the Twitter thread, is to imagine how it would go if you were to reuse doX in another context.

If you put the if inside of the function, would you be able to reuse the function in another context?

If yes then you can put the if inside of the function. Otherwise it suggests that the if is related to the calling code rather than doX, or said differently that it has a higher level of abstraction than doX.

For more details about relating code to the level of abstraction of a function or its calling context, you can find a detailed example in this video about good naming.

Know where to cut

The purpose of reducing the length of a long function is to make its code more expressive.

Levels of abstractions are a guide for slicing up long functions in a way that makes the resulting code readable and easier to maintain.

Thanks to all the people who participated to the Twitter thread and helped me slice my long dragon-function!

You will also like

Don't want to miss out ? Follow:
Share this post!

About Jonathan Boccara