Jonathan Boccara's blog

How to Check If a String Is a Prefix of Another One in C++

Published May 17, 2019 - 0 Comments

The simple operation of checking if a string is a prefix of another one is not standard in C++. We will implement it step by step, and at the end of this article you’ll find the complete implementation ready to paste into your code.

We will also make the code generic for checking is any sequence is a prefix of another one.

In C++20, the std::string offers this feature in its interface, with the start_with member function (that has been added along the end_with member function). Thanks to Marshall Clow for pointing this out.

Before C++20 we need to write some code ourselves. We’ll also make it generic so that it applies to other sequences than std::string.

It’s an interesting case study, because it will make us go over several aspects of writing expressive code:

  • Designing a clear interface,
  • Reusing standard code, with standard algorithms of the STL,
  • Respecting levels of abstraction,
  • Getting more familiar with the STL (with the topic of 1.5-ranges).

Let’s start by designing the interface.

A “strong” interface

The role of our function is to check if a string is a prefix of another string, and this information should be displayed in the prototype. We can achieve that by naming out the function isPrefix, and let the parameters express that the function needs two string to operate. Indeed, to make concise names, no need to repeat the info of the parameters in the function name.

There is something we need to pay special attention to in this interface though. It takes two strings: one is the prefix, and the other one is the larger string that we’re checking if it starts with that prefix. And we need to be very clear which is which.

Calling them s1 or s2 it would be confusing for a user of the interface, because they wouldn’t know which is which. The least we can do is to show the roles of the two parameters through their names:

bool isPrefix(std::string const& prefix, std::string const& text);

It shows which parameters is expected, when writing code that uses isPrefix. But there is still a chance of getting it wrong and mixing up the two parameters by accident.

This sort of accident can happen if you’re not paying too much attention (say, if you’ve just been interrupted) or if the interface changes in a branch and you’re working in another branch, and the two get merged without noticing the silent collision, for example.

Also, at call site you can’t tell which string is tested for being the prefix of the other one:

isPrefix(myFirstString, mySecondString); // which one is the prefix of the other?

To help with those issues we can use strong types: putting the information not only in the parameter name, but also in the parameter type. 

There are several ways to do strong typing in C++. We could use the NamedType library, but for such a simple case a struct will do the job:

struct Prefix { std::string const& value; };
struct Text { std::string const& value; };

bool isPrefix(Prefix prefix, Text text);

And a call site now looks like this:

isPrefix(Prefix(myFirstString), Text(mySecondString)); // now we see which one is the prefix of the other

You could prefer to make the const and reference attributes show in the strong type names:

struct PrefixConstRef { std::string const& value; };
struct TextConstRef { std::string const& value; };

bool isPrefix(PrefixConstRef prefix, TextConstRef text);

There is more info in the interface but the call site becomes more verbose:

isPrefix(PrefixConstRef(myFirstString), TextConstRef(mySecondString));

How do you feel about this trade-off? I prefer the first option, for the simpler call site, but would be interested in knowing your opinion. Don’t hesitate to leave a comment.

Now we have our interface!

struct Prefix { std::string const& value; };
struct Text { std::string const& value; };

bool isPrefix(Prefix prefix, Text text);

Let’s now write the implementation of the isPrefix function.

Reusing code for the implementation

There is no isPrefix in the C++ standard library, but since it’s such a natural thing to do, there must something not too far from it.

And there is: the std::mismatch STL algorithm will do most of the work of isPrefix.

std::mismatch

std::mismatch is one of the STL algorithms that query a property on two ranges. It walks down the two ranges while their elements are equal, and stops whenever they start to differ. The algorithm then returns the two positions in the respective ranges (in the form of a pair of iterators), at those places where they start to differ: 

Here is its prototype:

template<typename InputIterator1, typename InputIterator2>
std::pair<InputIterator1, InputIterator2> mismatch(InputIterator1 first1, InputIterator1 last1,
                                                   InputIterator2 first2, InputIterator2 last2);

Checking if a string is a prefix of another one is a special case of what std::mismatch does: it comes down to checking that the first position where they start to differ is the end of the prefix string.

So here is a possible implementation for isPrefix:

bool isPrefix(Prefix prefix, Text text)
{
    auto const differingPositions = std::mismatch(begin(prefix.value), end(prefix.value), begin(text.value), end(text.value));
    return differingPositions.first == end(prefix.value);
}

Raising the level of abstraction to ranges

This is a concise implementation, but we could go further and get rid of the iterators. We can wrap `std::mismatch` in an interface that expect the ranges (here, the strings) themselves.

namespace ranges
{
    template<typename Range1, typename Range2>
    std::pair<typename Range1::const_iterator, typename Range2::const_iterator> mismatch(Range1 const& range1, Range2 const& range2)
    {
        return std::mismatch(range1.begin(), range1.end(), range2.begin(), range2.end());
    }
}

Using it, the code of isPrefix becomes simpler:

bool isPrefix(Prefix prefix, Text text)
{
    auto const differingPositions = ranges::mismatch(prefix.value, text.value);
    return differingPositions.first == end(prefix.value);
}

The problem of 1.5 ranges

The STL overload of std::mismatch that we used took the two ranges in the form of a begin and an end iterator. This is the C++14 version of std::mismatch. And before C++14 the only available overload of std::mismatch was:

template<typename InputIterator1, typename InputIterator2>
std::pair<InputIterator1, InputIterator1> mismatch (InputIterator1 first1, InputIterator1 last1, InputIterator2 first2);

Note that this overload doesn’t take the last of the second range! It expects that the second be at least as long at the first one, and carries on until reaching the end of the first range (or two differing values).

The dramatic consequence is that if the first range is longer than the second one, std::mistmatch can read past the end of the second collection. And you don’t want that to happen because this is undefined behaviour (typically a crash of the application here).

But on the other hand, you don’t want to deal with this algorithm problem in the code of isPrefix either.

The range overload is a convenient place to put that logic, as it has access to the size of the ranges and can compare them. Indeed, if the second ranges happens to be shorter than the first one (the case where old std::mismatch doesn’t work), then we can swap the parameters:

namespace ranges
{
    template<typename Range1, typename Range2>
    std::pair<typename Range1::const_iterator, typename Range2::const_iterator> mismatch(Range1 const& range1, Range2 const& range2)
    {
        if (range1.size() <= range2.size())
        {
            return std::mismatch(range1.begin(), range1.end(), range2.begin());
        }
        else
        {
            auto const invertedResult = std::mismatch(range2.begin(), range2.end(), range1.begin());
            return std::make_pair(invertedResult.second, invertedResult.first);
        }
    }
}

Checking for prefix in any sequence

Why just limit our code to std::strings? It makes sense just as well to check if a sequence of elements of any type, not just chars, is a prefix of another one.

So let’s make our code generic to support any type of elements. Starting with the strong types:

template<typename T>
struct Prefix { T const& value; };

template<typename T>
struct MainSequence { T const& value; };

Before C++17, we need to make helper functions to deduce templates types (in C++17 the constructor is able to deduce the template types):

template<typename T>
Prefix<T> prefix(T const& value)
{
    return Prefix<T>{value};
}

template<typename T>
MainSequence<T> mainSequence(T const& value)
{
    return MainSequence<T>{value};
}

We can now make isPrefix generic too:

template<typename T, typename U>
bool isPrefix(Prefix<T> prefix, MainSequence<U> mainSequence)
{
    auto const differingPositions = ranges::mismatch(prefix.value, mainSequence.value);
    return differingPositions.first == end(prefix.value);
}

And use it with other sequences than strings:

std::vector<int> v1{1, 2, 3, 4, 5};
std::vector<int> v2{1, 2, 3, 4, 5, 6, 7, 8, 9, 0};

auto isV1PrefixOfV2 = isPrefix(prefix(v1), mainSequence(v2));

Here is all the code put together:

template<typename T>
struct Prefix { T const& value; };

template<typename T>
struct MainSequence { T const& value; };

template<typename T>
Prefix<T> prefix(T const& value)
{
    return Prefix<T>{value};
}

template<typename T>
MainSequence<T> mainSequence(T const& value)
{
    return MainSequence<T>{value};
}

namespace ranges
{
    template<typename Range1, typename Range2>
    std::pair<typename Range1::const_iterator, typename Range2::const_iterator> mismatch(Range1 const& range1, Range2 const& range2)
    {
        if (range1.size() >= range2.size())
        {
            return std::mismatch(range1.begin(), range1.end(), range2.begin());
        }
        else
        {
            auto const invertedResult = std::mismatch(range2.begin(), range2.end(), range1.begin());
            return std::make_pair(invertedResult.second, invertedResult.first);
        }
    }
}

template<typename T, typename U>
bool isPrefix(Prefix<T> prefix, MainSequence<U> mainSequence)
{
    auto const differingPositions = ranges::mismatch(prefix.value, mainSequence.value);
    return differingPositions.first == end(prefix.value);
}

If you have any comment on this case study, your feedback will be welcome!

You may also like

Don't want to miss out ? Follow:   twitterlinkedinrss
Share this post!Facebooktwitterlinkedin