Jonathan Boccara's blog

Transforming Deeply Nested Loops With STL Algorithms

Published November 1, 2019 - 0 Comments

This is a guest post written by Gary Taverner. Gary works for Marlan Maritime Ltd, a company concerned with maritime safety and the monitoring/mapping/management of changing coastline using radar.

In this article we examine some code that was difficult to understand only a week after it was written, and how by using the STL it was converted into something more pleasant to reason about and maintain. We reflect on the mental barriers to using the STL in the first place.

The initial code

Given a container of strings representing paths, the requirement was to search for files with a certain suffix. Once found, the files would be checked to see if they were valid for the next stage of processing, otherwise they would be rejected, renamed and logged.

The following examples are simplified for clarity (but hopefully not too much to make the old code easy to read). The code has to build in a system using the Borland Classic compiler (shipped with Embarcadero Berlin around 2016, so not old) and therefore cannot use many features of modern C++. My first instinct was to use for loops. A couple of weeks later in testing, it was discovered that the function did not quite always do what it was supposed to do and I had to fix it! This turned out to be difficult because the two week old function was so hard to understand.

Here is the code:

namespace bfs = boost::filesystem;

//(1)

//member variable, std::vector< std::string> pathStrings

//turn strings into paths

std::vector< std::string>::iterator strIter;

for( strIter = pathStrings.begin(); strIter != pathStrings.end(); ++strIter)
{
    //(2)
    bfs::path myPath( *strIter);
    if( !bfs::exists( myPath) || !bfs::is_directory( myPath))   
    {
        //log the error - config should provide paths to dirs.   
    }
    else
        //(3)   
    {
        for( bfs::directory_iterator dirIt( myPath); dirIt != bfs::directory_iterator; ++dirIt)   
        {
            //test file for suffix and size, handle/log errors
            // and set boolean flags here to be
            // in scope in all conditional cases below
            //(4)
            bool good( false);
            bool useable( false);
            if( !bfs::is_regular_file( *dirIt))   
            {
                // setting useable not required, it is already false, here for 'clarity'.
                useable = false;   
            }
            else   
            {
                // simplified here, tests for suitable suffix
                //(5)
                std::string possPath( myPath.string());
                std::string pathSuff( possPath.substr( possPath.length() - 10))
                // searchSuff declared elsewhere
                if( pathSuff  == searchSuff)
                {
                    useable = true;   
                }
                else
                {
                    //log info   
                }
                // simplified size testing
                if( bfs::file_size( myPath) > 0)   
                {
                    good = true;   
                }
                if( good && useable)   
                {
                    // pass file to next stage
                    // rename file with success suffix   
                }
                else   
                {
                    //rename file with fail suffix   
                }   
            }   
        }   
    }   
}

Loopy Code Explained

At (1) above we begin by constructing an iterator for the vector of strings and then from (2) we iterate through the strings and from each construct a boost filesystem path.

We need to know if the path exists and whether it is a directory. If it is, we construct a directory iterator from it at (3) and iterate through each path of that directory to see if we have a regular file and at (4) create two boolean flags, ‘good and ‘useable’ (yes, they are bad names), at a high enough scope that they can be seen wherever needed and far enough away from the point of use, that they are confusing – even a few weeks later.

From (5) we test the current file to see if it is large enough and has the correct suffix. The code here could have been wrapped in a function ‘testAndRename’ say, but it would have to have the path passed to it and even its name makes it obvious it does more than one thing. Also it would still need to return something for the outer scope to know whether this is a file to pass on to the next stage or not so there would still be branching and not much gained in terms of readability.

Transforming The Code

Having watched Jonathan’s talk 105 STL Algorithms in Less Than an Hour given at CppCon on YouTube™ around the time that this code needed to be fixed, I was inspired to use the STL to rewrite this code to make it correct and readable.

A change in mindset is required. With for loops there is a feeling of being in the action and knowing exactly what’s going on. With the STL we have to think more about containers and which arguments are to be passed to any function or function object we write.

Previously, at (1) repeated below, we iterated through the strings turning them into paths:

//(1)
//turn strings into paths, pathStrings is of type std::vector< std::string>
std::vector< std::string>::iterator strIter;
for( strIter = pathStrings.begin(); strIter != pathStrings.end(); ++strIter)

So how do we take a container of strings and turn them into a container of paths? Well, it seemed obvious to me that for each string I wanted a path so for_each?

However the slide that stuck in my mind in Jonathan’s talk said ‘It’s not just for_each’. If we think about this for a while we see that we want objects in one container to be used to construct something else which is then placed into another container.

The std::transform algorithm is the answer, with a function that takes a std::string and returns a path as below at (6). It felt weird being out of the loop and handing over the responsibility for dereferencing to the algorithm. Then there was some puzzling about whether the function stringToPath() should take a value, a reference, or a const reference.

After this (when it wouldn’t compile) some extra research was required to understand that std::back_inserter was needed.

//(6)
namespace bfs = boost::filesystem;
//member variable, std::vector< bfs::path> searchPaths;
std::transform(pathStrings.begin(), pathStrings.end(), std::back_inserter( searchPaths), stringToPath);

Originally we processed each path entirely one at a time as at (2) above. Now we have a collection of paths and we need to distinguish between valid and invalid paths.

We are dividing the collection into two based on a simple true or false test. It is easy to see that std::partition can do the work. The function isValid( bfs::path) used at (7) below is a free function that tests the path and returns true if it exists and is a directory. This replaces the test at (2). The iterator endIter is used later.

//(7)
std::vector< bfs::path>::iterator endIter;
endIter = std::partition( searchPaths.begin(), searchPaths.end(), isValid);

Now that the paths are sorted into valid and invalid, what do we do? At this point If you are like me, you hit a wall, we seem to be missing a container to iterate through.

However, we have multiple containers as each directory path is a container of unknown things. We need to find an as yet unknown number of files and put their paths in another container.

So for each directory path we need to create a container, put file paths in it and return it? No, that is not going to work. What we need to do is create a container and give it to a functor. That functor fills the container with the paths of files that it discovers when it is called with a directory path as an argument.

Side effects! This is a proper use of std::for_each. The functor FilesInDirs at (8) is constructed with an empty container of paths. Each time it is called with a valid directory path it constructs a directory iterator and each path found is pushed into the path container called paths.

//(8)
std::vector< bfs::path> paths;
FilesInDirs filesInDirs( paths);
std::for_each(searchPaths.begin(), endIter, filesInDirs);

At (4) in the original code it was necessary to introduce a couple of boolean flags at a high enough scope that they could be seen throughout the rest of the function. These are not needed in the new code.

At (5) in the original we start a series of branching tests and look for file names that match a pattern. We have to set the flags ‘good’ and ‘useable’ as we go and then test them in combination to determine how to proceed.

At (9) in the new version we eliminate files that do not match from our container. The files which are not useable are removed from the paths container using the combination of std::remove_if and std::erase. std::remove_if sorts the paths and returns a pointer to the start of the unwanted paths which std::erase uses as the beginning of the range to remove.

Some of the logic from (5), with less branching, made its way into the new functor IsUnusable and in doing so became easier to read and understand.

IsUnusable isUnusable( searchSuffix);
paths.erase(std::remove_if
            ( paths.begin(),
             paths.end(), isUnusable),
            paths.end());

Finally, at (10) there was one more check to do to see if the files were of suitable size to be processed. This was separated from isUnuseable for ease of logging. After the previous work this was easy to do with a functor IsBadInput constructed with the minimum acceptable number of data blocks in the file.

//(10)
IsBadInput isBadInput( 3);
paths.erase(std::remove_if
            ( paths.begin(),
             paths.end(), isBadInput),
            paths.end());

Now paths is a container of files which can be processed.

It concerned me at first that in this rewrite there might be a lot of work to put files in a container just to remove them again compared to testing them on the fly. At the same time there might now be opportunities to parallelize the processing of the files that had not existed before.

Without a doubt it is easier to read and debug. I have not shown the implementation of the predicate and functors here but the logic of them is similar to the deeply nested logic of the earlier code but much easier to reason about.

Here is the new code in its entirety, it is much more declarative.

//(6)
namespace bfs = boost::filesystem;
//member variable, std::vector< std::string> pathStrings
std::vector< bfs::path> searchPaths;
std::transform(pathStrings.begin(), pathStrings.end(), std::back_inserter( searchPaths), stringToPath);

//(7)
std::vector< bfs::path>::iterator endIter;
endIter = std::partition( searchPaths.begin(), searchPaths.end(), isValid);

//(8)
std::vector< bfs::path> paths;
FilesInDirs filesInDirs( paths);
std::for_each(searchPaths.begin(), endIter, filesInDirs);

//(9)
IsUnusable isUnusable( searchSuffix);
paths.erase(std::remove_if
            ( paths.begin(),
             paths.end(), isUnusable),
            paths.end());

//(10)
IsBadInput isBadInput( 3);
paths.erase(std::remove_if
            ( paths.begin(),
             paths.end(), isBadInput),
            paths.end());

Clearer code with the STL

In this example, using STL algorithms allows to break down a big for loop into manageable little steps, and even to remove some of its code.

We’ll see how it stands the test of time, but already a week after writing it, the new code using the STL is much more understandable than the loop was at the same age.

You will also like

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin