How to Get the “Table of Contents” Of a Long Function

Published March 13, 2020

Long functions are hard to understand, and to write expressive code we generally try to keep functions short enough to get an overview of what they’re doing.

The exact threshold over which a function becomes too long has been debated and is not clear today (see Code Complete, section 7.4 for a discussion about this), but the consensus is that functions over several hundreds or thousands of lines are definitely too long.

In spite of this guideline, there are long functions in code out there. Perhaps you have some in the legacy parts of your codebase. When we come across such a long function, how can we know what it is about?

In my book The Legacy Code Programmer’s Toolbox I talk extensively about how to deal with code that is hard to understand, and in particular about long functions. One simple technique to have a rough overview of the structure of a function is to focus on its control flow. This allows to get an approximate “Table of Contents” of the function.

table of contents code function

Filtering on control flow

The control flow of the function is shaped by the control flow keywords, for instance:

if
else
for
while
do
switch
case
try
catch

and so on.

To get an overview of a long function, we can filter its lines and leave only those that contain one of the above words.

Let’s try this with an example. The following C++ function comes from an open source project called Scene-text-recognition. The point is not to pick on that particular project, but rather to look at code we’re not familiar with. Indeed, the following function is not trivial to read just by glancing at it:

ER* ERFilter::er_tree_extract(Mat input)
{
    CV_Assert(input.type() == CV_8UC1);

    Mat input_clone = input.clone();
    const int width = input_clone.cols;
    const int height = input_clone.rows;
    const int highest_level = (255 / THRESH_STEP) + 1;
    const uchar *imgData = input_clone.data;

    input_clone /= THRESH_STEP;

    //!< 1. Clear the accessible pixel mask, the heap of boundary pixels and the component
    bool *pixel_accessible = new bool[height*width]();
    vector<int> boundary_pixel[256];
    vector<int> boundary_edge[256];
    vector<ER *>er_stack;
    
    int priority = highest_level;


    //!< 1-2. push a dummy-component onto the stack, 
    //!<      with grey-level heigher than any allowed in the image
    er_stack.push_back(new ER(256, 0, 0, 0));


    //!< 2. make the top-right corner the source pixel, get its gray level and mark it accessible
    int current_pixel = 0;
    int current_edge = 0;
    int current_level = imgData[current_pixel];
    pixel_accessible[current_pixel] = true;

    
step_3:
    int x = current_pixel % width;
    int y = current_pixel / width;

    //!< 3. push an empty component with current_level onto the component stack
    er_stack.push_back(new ER(current_level, current_pixel, x, y));


    for (;;)
    {
        //!< 4. Explore the remaining edges to the neighbors of the current pixel, in order, as follows : 
        //!<    For each neighbor, check if the neighbor is already accessible.If it
        //!<    is not, mark it as accessible and retrieve its grey - level.If the grey - level is not
        //!<    lower than the current one, push it onto the heap of boundary pixels.If on
        //!<    the other hand the grey - level is lower than the current one, enter the current
        //!<    pixel back into the queue of boundary pixels for later processing(with the
        //!<    next edge number), consider the new pixel and its grey - level and go to 3.
        int neighbor_pixel;
        int neighbor_level;
        

        for (; current_edge < 4; current_edge++)
        {
            switch (current_edge)
            {
                case right    : neighbor_pixel = (x + 1 < width)    ? current_pixel + 1        : current_pixel;    break;
                case bottom    : neighbor_pixel = (y + 1 < height) ? current_pixel + width : current_pixel;    break;
                case left    : neighbor_pixel = (x > 0)            ? current_pixel - 1        : current_pixel;    break;
                case top    : neighbor_pixel = (y > 0)            ? current_pixel - width : current_pixel;    break;
                default: break;
            }
                        
            if (!pixel_accessible[neighbor_pixel] && neighbor_pixel != current_pixel)
            {
                pixel_accessible[neighbor_pixel] = true;
                neighbor_level = imgData[neighbor_pixel];

                if (neighbor_level >= current_level)
                {
                    boundary_pixel[neighbor_level].push_back(neighbor_pixel);
                    boundary_edge[neighbor_level].push_back(0);

                    if (neighbor_level < priority)
                        priority = neighbor_level;
                }
                else
                {
                    boundary_pixel[current_level].push_back(current_pixel);
                    boundary_edge[current_level].push_back(current_edge + 1);

                    if (current_level < priority)
                        priority = current_level;

                    current_pixel = neighbor_pixel;
                    current_level = neighbor_level;
                    current_edge = 0;
                    goto step_3;
                }
            }
        }

        //!< 5. Accumulate the current pixel to the component at the top of the stack 
        //!<    (water saturates the current pixel).
        er_accumulate(er_stack.back(), current_pixel, x, y);

        //!< 6. Pop the heap of boundary pixels. If the heap is empty, we are done. If the
        //!<    returned pixel is at the same grey - level as the previous, go to 4    
        if (priority == highest_level)
        {
            delete[] pixel_accessible;
            return er_stack.back();
        }
            
            
        int new_pixel = boundary_pixel[priority].back();
        int new_edge = boundary_edge[priority].back();
        int new_pixel_grey_level = imgData[new_pixel];

        boundary_pixel[priority].pop_back();
        boundary_edge[priority].pop_back();

        while (boundary_pixel[priority].empty() && priority < highest_level)
            priority++;

        current_pixel =  new_pixel;
        current_edge = new_edge;
        x = current_pixel % width;
        y = current_pixel / width;

        if (new_pixel_grey_level != current_level)
        {
            //!< 7. The returned pixel is at a higher grey-level, so we must now process all
            //!<    components on the component stack until we reach the higher grey - level.
            //!<    This is done with the ProcessStack sub - routine, see below.Then go to 4.
            current_level = new_pixel_grey_level;
            process_stack(new_pixel_grey_level, er_stack);
        }
    }
}

If we remove all the lines that don’t contain a control flow keyword, we get this:

    for (;;)
        for (; current_edge < 4; current_edge++)
            switch (current_edge)
                case right    : neighbor_pixel = (x + 1 < width)    ? current_pixel + 1        : current_pixel;    break;
                case bottom    : neighbor_pixel = (y + 1 < height) ? current_pixel + width : current_pixel;    break;
                case left    : neighbor_pixel = (x > 0)            ? current_pixel - 1        : current_pixel;    break;
                case top    : neighbor_pixel = (y > 0)            ? current_pixel - width : current_pixel;    break;
            if (!pixel_accessible[neighbor_pixel] && neighbor_pixel != current_pixel)
                if (neighbor_level >= current_level)
                    if (neighbor_level < priority)
                else
                    if (current_level < priority)
        if (priority == highest_level)
        while (boundary_pixel[priority].empty() && priority < highest_level)
        if (new_pixel_grey_level != current_level)

Now that we have less code to look at, let’s what information we can draw out of this shortened view of the function.

The first line of this result is a loop with no stop condition in its declaration, and contains all the rest of the control flow of the function. This seems like a structuring piece of information that you’d like to know right away when analysing the function. By contrast, this for loop is located at line 42 in the original function, which means that by reading it line by line we would have to go through 40 lines of code before encountering this piece of information.

Then there is a for loop on 4 types of edges that appear in the switch statement that follows: right, bottom, left and top.

The code then checks if a pixel in that given direction is accessible and compares “levels” with “priorities”. It does something specific to the highest_level. Finally it does something related to a “grey level”.

An overview of the function

Of course, this doesn’t tell everything about the function and some concepts (such as “grey level”) need to be clarified by a further reading of the function’s code if we want to understand them, or maybe by knowing more about the domain of that piece of software.

But we now have a hypothesis on the function’s overall structure: it seems to move around in a picture by hopping from adjacent pixel to adjacent pixel, comparing “levels” and “priorities” along the way.

This may be enough if all you’re after is a general idea of the function, and if you do need to understand the function in greater detail, having an idea beforehand about where the function is going is a useful help. A bit like studying the table of contents of a book makes its further reading more efficient.

Also, reading the table of contents of a book may tell you right away that a particular book doesn’t address the topic you’re searching for. You can then save time but putting the book aside. Similarly, if a a brief analysis of a function reveals that it doesn’t contain the information you’re after, you can save time by navigating away from it.

How to run the filter

Suppose you’d like to filter one of the functions of your codebase to see only its control flow. How to go about it in practice?

One solution is to use a text editor. For example Vim allows to perform this filter with the following command:

:g!/\(\<if\>\|\<else\>\|\<for\>\|\<while\>\|\<do\>\|\<switch\>\|\<case\>\|\<try\>\|\<catch\>\)/d

I assume that another solution is to rely on your IDE. I’m only assuming because I don’t know how to fold up every pair of braces (which can be an approximation of control flow blocks) in Visual Studio, Xcode or another IDE. If you know how to do this in your IDE, please leave a comment to explain how you do it.

Finally, another solution is… to write code to do it for you! This is what we will explore in future posts. We will see how to write expressive code to perform the filter, by using STL algorithms and by using C++ ranges.

In the meantime, if you want to react to the technique on filtering on control flow, or if you have feedback after trying on your code, you’re welcome to leave a comment below.

Stay tuned!

You will also like

Don't want to miss out ? Follow:
Share this post!

About Jonathan Boccara

How to Get the “Table of Contents” Of a Long Function

Filtering on control flow

An overview of the function

How to run the filter

You will also like