Jonathan Boccara's blog

How to Merge Consecutive Elements in a C++ Collection

Published October 15, 2019 - 0 Comments

Merging identical consecutive elements in a collection is a recurring need, in C++ or elsewhere in programming.

For example, we could want to aggregate a collection of hourly results into a collection of daily results: all the results of each day get aggregated into one for that day. In this case, being “identical” means being on the same day, and “aggregating” means taking two results with a common date, and creating a result at this date and with the sum of their amounts.

If you’re in a rush and looking for a solution to this problem, you will find one below. But if you have a bit of time, why don’t you give it a try yourself before looking at a solution? That was the topic of the last previous post on Fluent C++, that embedded a playground to experiment. Check it out!

Now let’s see one way to implement this algorithm, that we can call merge_adjacent.

merge_adjacent

Here is a possible implementation of merge_adjacent. We’ll go through it step by step just after:

template <typename ForwardIterator, typename OutputIterator, typename Equal, typename Merge>
void merge_adjacent(ForwardIterator first, ForwardIterator last, OutputIterator out, Equal equal, Merge merge)
{
    auto beginUnique = first;
    while (beginUnique != last)
    {     
      // output all unique elements; stop when finding indentical adjacent ones
      auto endUnique = std::adjacent_find(beginUnique, last, equal);
      out = std::copy(beginUnique, endUnique, out);
    
      // identify the range of identical adjacent elements
      auto beginIdentical = endUnique;
      auto endIdentical = std::find_if_not(beginIdentical, last, [beginIdentical, equal](auto const& element) {return equal(element, *beginIdentical);});
    
      // aggregate identical flows into one
      if (beginIdentical != endIdentical)
      {
          if (std::distance(beginIdentical, endIdentical) == 1)
          {
             *out = *beginIdentical;
          }
          else
          {
             *out = std::accumulate(std::next(beginIdentical), endIdentical, *beginIdentical, merge);
          }
         ++out;
      }
      beginUnique = endIdentical;
    }
}

Interface

template <typename ForwardIterator, typename OutputIterator, typename Equal, typename Merge>
void merge_adjacent(ForwardIterator first, ForwardIterator last, OutputIterator out, Equal equal, Merge merge)

First the interface: the algorithm follows the conventions of the STL, by taking two input iterators and an output iterator. It uses the input iterators to know where the input range starts and ends.

We could also add another overload that takes one Range type instead of two iterators, extracts a begin and an end from that range and calls merge_adjacent with them:

template <typename ForwardRange, typename OutputIterator, typename Equal, typename Merge>
void merge_adjacent(ForwardRange& range, OutputIterator out, Equal equal, Merge merge)
{
    return merge_adjacent(begin(range), end(range), out, equal, merge);
}

We will use the parameters equal and merge to compare and aggregate elements together, respectively.

Copying the unique elements

The idea of the algorithm is pretty simple: iterate over the elements that are not equal to their right neighbour, and copy them to the output out. Then iterate over the elements that are identical to each other, aggregate them, and send that aggregate to the output. Repeat those two steps until reaching the end of the collection.

So we start by finding the first subrange of unique elements. It starts at the beginning, and goes on until we find two identical consecutive elements (which what std::adjacent_find does):

    auto beginUnique = first;
    while (beginUnique != last)
    {     
      // output all unique elements; stop when finding indentical adjacent ones
      auto endUnique = std::adjacent_find(beginUnique, last, equal);

We copy those elements to the output:

      out = std::copy(beginUnique, endUnique, out);

Note that std::copy returns an output iterator that points to the end of the elements it inserted. In other terms, this is the position that we should use to output next, which is why we replace out with this new position.

Identifying the identical elements

By definition, the end of the unique elements is also the beginning of the identical ones.

To clearly express in code that we will now work on this range of identical elements, we create an new iterator:

      // identify the range of identical adjacent elements
      auto beginIdentical = endUnique;

We could argue that this step is useless, because we could have written code just as correct by keeping using endUnique. But beginIdentical better translates our intentions and therefore makes the code more expressive.

Will creating this iterator just for the purpose of its name incur a cost? Maybe, maybe not. For all we know, the optimizer may even remove it altogether from the compiled code.

But what is certain is that it adds value by making code expressive, and if you agree with that then there is no reason not to write it. If ever a profiler turns out to point it out as a performance problem, we would remove it then.

The sub-range of identical elements starts where an element is different from the others of that sub-range (and, in particular, different from its first element). This is where we use the equal predicate parameter, in conjunction with the STL algorithm find_if_not:

      auto endIdentical = std::find_if_not(beginIdentical, last, [beginIdentical, equal](auto const& element) {return equal(element, *beginIdentical);});

If you’re not familiar yet with all the STL algorithms, check out the World Map of the STL Algorithms with its accompanying talk, to take a guided tour around this fascinating world.

Aggregating the identical flows together

The way we perform the aggregation depends on the number of elements to aggregate.

If the sub-range of identical elements is empty (for example if the collection ends with a bunch of unique elements), then there is nothing to do:

      // aggregate identical flows into one
      if (beginIdentical != endIdentical)

If there is only one element, then this is the “aggregate”, and we send it to the output:

          if (std::distance(beginIdentical, endIdentical) == 1)
          {
             *out = *beginIdentical;
          }

And if it has more than one element, we compute the aggregate with std::accumulate (or std::reduce in C++17) on the rest of the elements, by passing it the first element as an initial value:

          else
          {
             *out = std::accumulate(std::next(beginIdentical), endIdentical, *beginIdentical, merge);
          }

Indeed, std::accumulate needs an initial value. std::reduce does too, unless you happy for it to take as an initial value a value-initialized object of the underlying type of the iterator.

Now that we’ve written to the output iterator, we need to increment its position:

         ++out;
      }

Repeating the operation

We have now treated the basic unit of data for our algorithm: a sub-range of unique elements followed by a sub-range of identical elements. We can perform the same operation again to the next such unit in the collection:

      beginUnique = endIdentical;
    }
}

And so on.

How to merge consecutive elements in a C++ collection

This algorithm can be used to merge identical consecutive elements in a collection. If you see how to improve it, please show me how! I’d love to read your feedback in the comments section below.

Have you encountered the need of aggregating identical consecutive elements in your code? What was the meaning of “identical” and “aggregating” in your case? Leave a comment to let us know!

You may also like

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin