How Lambdas Make Function Extraction Safer

Published November 13, 2020

One of the most interesting talks I saw when I was at CppCon 2019 was also one of the shortest ones.

During one of the lightning talks evenings, Ezra (a.k.a. eracpp) demonstrated a technique to extract some code from a long function in a systematic way. Long functions are common in C++ legacy code, and extracting sub-functions out of them is a great way to make their code more expressive.

This technique, inspired from a tutorial of the Jai language, allows to perform refactoring of legacy C++ code in a relatively safe and stressless way.

Thanks to Ezra for reviewing this post.

Extracting a function in several steps

In short, the technique consists in the following steps:

surround the code you want to extract by a immediately invoked lambda,
use the compiler to show the outputs of this function, and add them,
use the compiler to show the inputs of this function, and add them,
copy-paste the code into a sub-function.

To illustrate those steps, let’s see an example of code that needs function extraction:

void aggregateAndDisplay(std::map<int, std::string> const& source, std::map<int, std::string> const& destination)
{
    auto aggregatedMap = destination;
    for (auto const& sourceEntry : source)
    {
        auto destinationPosition = aggregatedMap.find(sourceEntry.first);
        if (destinationPosition == aggregatedMap.end())
        {
            aggregatedMap.insert(std::make_pair(sourceEntry.first, sourceEntry.second));
        }
        else
        {
            aggregatedMap[sourceEntry.first] = sourceEntry.second + " or " + destinationPosition->second;
        }
    }

    for (auto const& entry : aggregatedMap)
    {
        std::cout << "Available translations for " << entry.first << ": "
                  << entry.second << '\n';
    }
}

As its name suggests, this function does two things: aggregating data into a map, and displaying the aggregated data.

With the following calling code:

auto const source = std::map<int, std::string>{{1, "one"}, {2, "two"}, {3, "three"}};
auto const destination = std::map<int, std::string>{{2, "dos"}, {3, "tres"}, {4, "quatro"}};
aggregateAndDisplay(source, destination);

The program outputs this:

Available translations for 1: one
Available translations for 2: two or dos
Available translations for 3: three or tres
Available translations for 4: quatro

The code begs for function extraction (and for other design improvements as well, but which we won’t focus on here): one sub-function that performs the aggregation, and another one that performs the display.

This function is adapted to illustrate the technique, because its structure is apparent. In legacy C++ code, the structure may be less apparent. Identifying the relevant blocks to extract is out of the scope of this technique, but I’d love to know how you go about that. We’ll go back to that at the end of the post.

Assuming we identified those blocks, let’s extract them into sub-functions.

Surrounding the code to extract

As a first step, let’s start by surround the code to extract with an immediately invoked lambda expression:

void aggregateAndDisplay(std::map<int, std::string> const& source, std::map<int, std::string> const& destination)
{
    [&]
    {
        auto aggregatedMap = destination;
        for (auto const& sourceEntry : source)
        {
            auto destinationPosition = aggregatedMap.find(sourceEntry.first);
            if (destinationPosition == aggregatedMap.end())
            {
                aggregatedMap.insert(std::make_pair(sourceEntry.first, sourceEntry.second));
            }
            else
            {
                aggregatedMap[sourceEntry.first] = sourceEntry.second + " or " + destinationPosition->second;
            }
        }
    }();

    for (auto const& entry : aggregatedMap)
    {
        std::cout << "Available translations for " << entry.first << ": "
                  << entry.second << '\n';
    }
}

The lambda captures everything by reference, and is invoked on the same statement as its creation. This means that the code of the lambda is immediately executed. And thanks to the capture by reference, it can affect the objects inside the function just like the initial code did.

Finding out the outputs

But introducing the lambda generates an error for the values that are created by the code to extract and used later on in the function:

<source>: In function 'void aggregateAndDisplay(const std::map<int, std::__cxx11::basic_string<char> >&, const std::map<int, std::__cxx11::basic_string<char> >&)':
<source>:29:30: error: 'aggregatedMap' was not declared in this scope
   29 |     for (auto const& entry : aggregatedMap)
      |                              ^~~~~~~~~~~~~

Those values are the “outputs” of the code to extract.

To make the code compile and run again, we can make the lambda return those outputs for the rest of the function to use them:

void aggregateAndDisplay(std::map<int, std::string> const& source, std::map<int, std::string> const& destination)
{
    auto const aggregatedMap = [&]() -> std::map<int, std::string>
    {
        auto aggregatedMap = destination;
        for (auto const& sourceEntry : source)
        {
            auto destinationPosition = aggregatedMap.find(sourceEntry.first);
            if (destinationPosition == aggregatedMap.end())
            {
                aggregatedMap.insert(std::make_pair(sourceEntry.first, sourceEntry.second));
            }
            else
            {
                aggregatedMap[sourceEntry.first] = sourceEntry.second + " or " + destinationPosition->second;
            }
        }
        return aggregatedMap;
    }();

    for (auto const& entry : aggregatedMap)
    {
        std::cout << "Available translations for " << entry.first << ": "
                  << entry.second << '\n';
    }
}

Now the code compiles and the output of the program remains the same as before:

Available translations for 1: one
Available translations for 2: two or dos
Available translations for 3: three or tres
Available translations for 4: quatro

Note the nice side effect: aggregate is now a const value, since all the modifications needed for its filling are done inside the lambda.

Finding out the inputs

Let’s now use the compiler again to find the inputs of the code we want to extract.

Those inputs are the values that are captured by the lambda. Removing the capture makes them appear in compile errors:

void aggregateAndDisplay(std::map<int, std::string> const& source, std::map<int, std::string> const& destination)
{
    auto const aggregatedMap = []() -> std::map<int, std::string>
    {
        auto aggregatedMap = destination;
        for (auto const& sourceEntry : source)
        {
            auto destinationPosition = aggregatedMap.find(sourceEntry.first);
            if (destinationPosition == aggregatedMap.end())
            {
                aggregatedMap.insert(std::make_pair(sourceEntry.first, sourceEntry.second));
            }
            else
            {
                aggregatedMap[sourceEntry.first] = sourceEntry.second + " or " + destinationPosition->second;
            }
        }
        return aggregatedMap;
    }();

    for (auto const& entry : aggregatedMap)
    {
        std::cout << "Available translations for " << entry.first << ": "
                  << entry.second << '\n';
    }
}

Here are the compile errors:

<source>: In lambda function:
<source>:14:30: error: 'destination' is not captured
   14 |         auto aggregatedMap = destination;
      |                              ^~~~~~~~~~~
<source>:12:33: note: the lambda has no capture-default
   12 |     auto const aggregatedMap = []() -> std::map<int, std::string>
      |                                 ^
<source>:10:102: note: 'const std::map<int, std::__cxx11::basic_string<char> >& destination' declared here
   10 | void aggregateAndDisplay(std::map<int, std::string> const& source, std::map<int, std::string> const& destination)
      |                                                                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
<source>:15:40: error: 'source' is not captured
   15 |         for (auto const& sourceEntry : source)
      |                                        ^~~~~~
<source>:12:33: note: the lambda has no capture-default
   12 |     auto const aggregatedMap = []() -> std::map<int, std::string>

Our inputs are therefore source and destination. Let’s add them as inputs of the lambda:

void aggregateAndDisplay(std::map<int, std::string> const& source, std::map<int, std::string> const& destination)
{
    auto const aggregatedMap = [](std::map<int, std::string> const& source, std::map<int, std::string> const& destination) -> std::map<int, std::string>
    {
        auto aggregatedMap = destination;
        for (auto const& sourceEntry : source)
        {
            auto destinationPosition = aggregatedMap.find(sourceEntry.first);
            if (destinationPosition == aggregatedMap.end())
            {
                aggregatedMap.insert(std::make_pair(sourceEntry.first, sourceEntry.second));
            }
            else
            {
                aggregatedMap[sourceEntry.first] = sourceEntry.second + " or " + destinationPosition->second;
            }
        }
        return aggregatedMap;
    }(source, destination);

    for (auto const& entry : aggregatedMap)
    {
        std::cout << "Available translations for " << entry.first << ": "
                  << entry.second << '\n';
    }
}

The code now compiles and runs again.

Copy-paste the code into a sub-function

The code is now ready to be extract in a single swoop. Indeed, the lambda is already a function within our function. We only need to take it out of the function, remove the [], add an auto and give it a name:

auto aggregate(std::map<int, std::string> const& source, std::map<int, std::string> const& destination) -> std::map<int, std::string>
{
    auto aggregatedMap = destination;
    for (auto const& sourceEntry : source)
    {
        auto destinationPosition = aggregatedMap.find(sourceEntry.first);
        if (destinationPosition == aggregatedMap.end())
        {
            aggregatedMap.insert(std::make_pair(sourceEntry.first, sourceEntry.second));
        }
        else
        {
            aggregatedMap[sourceEntry.first] = sourceEntry.second + " or " + destinationPosition->second;
        }
    }
    return aggregatedMap;
}

void aggregateAndDisplay(std::map<int, std::string> const& source, std::map<int, std::string> const& destination)
{
    auto const aggregatedMap = aggregate(source, destination);

    for (auto const& entry : aggregatedMap)
    {
        std::cout << "Available translations for " << entry.first << ": "
                  << entry.second << '\n';
    }
}

A recipe means less stress

What I find very nice in this technique presented by Ezra is that, no matter the complexity of the code to extract, the refactoring is broken down in a couple of simple steps that compile, runs and pass the tests (which we didn’t see here).

Those steps can become a mechanic way to change code, that ensures we don’t miss any input or outputs of the code to extract. I find that this makes refactoring fun (or even more fun if, like me, you enjoy refactoring as an activity).

That said, there is another important step that occurs before all this extraction: identifying the scope of the code to extract. We didn’t touch on this in this article.

How do you proceed when you extract code from a long function? How do you decide what to extract in a sub-function? Please leave your answers in a comment below, I’d love to read them.